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VACCINE 



FIELD OF INVENTION 

This invention relates to a general method for detecting pathogenic strains of 
5 bacteria which harbour a type III secretion system, and characterising regions of the 
chromosome of said strain where virulence genes reside. More particularly, this 
invention relates to the method as applied to the pathogen Bordetella pertussis. 
Furthermore, the invention relates to newly identified polynucleotides within these 
regions, virulent polypeptides encoded by them and to the use of such polynucleotides 
10 and polypeptides, and to their production. More particularly the polynucleotides and 
polypeptides of the present invention relate to the BopN, Orfl, Orf2, Orf3, Orf4, Orf5, 
Orf6, Orf7, Orf8, Orf9, OrflO, Orfll, Orfl 2, Orfl 3, Orfl 4, and Orfl 5 effector proteins 
of Bordetella pertussis. 



1 5 BACKGROUND OF THE INVENTION 

Type III secretion systems: 

Pathogenic bacteria invade many different niches in a broad host range and cause 
a wide variety of syndromes. It is due to this fact that it was believed previously that each 
20 disease might be induced by a distinct molecular mechanism. However, the spectrum of 
such mechanisms is not as broad as first imagined; rather, bacteria exploit a number of 
common molecular tools to achieve a range of goals. Among these tools are type III 
secretion systems, which provide a means for bacteria to target virulence factors directly 
at host cells. These factors then tamper with host cell functions to the pathogens' benefit. 

25 

The type III export system is responsible for secretion of Salmonella and Shigella 
invasion and virulence factors, Enteropathogenic Escherischia coli (EPEC) signal 
transduction molecules, virulence factors in several plant pathogens (for instance 
Xanthomonas campestris pv. vesicatoria [Fenselau et al., 1992]) and Yops proteins in 
30 Yersinia. Yops export mechanism has been the most intensively investigated type III 
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* ,, • t ,1 1 994- Bergman et al., 1994). In this 
secretion apparatus (see for instance: Allao.et ^ „ plasmid 

sys tem, more than 20 different cell envelope. 

are presto ~* 

Besides—^ 

the Yops 'proteins which are the secreted^substrates ana PP 
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virulence. 
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f tvnP Til secretion systems originating from different 
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Pathogenicity island 
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virulence. Although they can compose type III secretin sy 



do so. 
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resi ^„ Ho weve, numerous ^ y !T— ^ - 

c,rf«ri«iflfilv • simila*ly**as *when thejM are, iu 
c „»e„ Sur^ngte ^^taiuncUonaUy re!^ group. a,so. Such 
chromosoma. virulence gene, often dustered ^ which 

^ of *— — * - ^ x — *- -* 

can be defined as compact, distmct genet,c unrts carryms 
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often flanked by direct repeats, occupy large chromosomal regions (often > 30 kb) and 
are present in pathogenic strains, whilst being absent or sporadically distributed in less- 
pathogenic (or non-pathogenic) strains of a bacterial species. These DNA segments are 
frequently associated with tRNA genes and/or insertion sequence (IS) elements at their 
boundaries. In addition, their G+C content often differs from that of host bacterial DNA, 
suggesting a foreign origin. 

Pathogenicity islands have been discovered in an increasing number of bacterial 
pathogens, including different categories of E. coli, Salmonella typhimurium, Yersinia 
spp, Helicobacter pylori^ Vibrio cholera etc. 

The first intensively studied pathogenicity islands were Pai I and Pai II, which 
encode the haemolysin determinants of uropathogenic E. coli. These two Pais, are 
flanked by direct repeats and can be deleted from the chromosome at frequencies of 1 0* 4 , 
resulting in non- virulent mutant strains. Another pathogenicity island of 35 kb has 
recently been identified on the chromosome of enteropathogenic E. coli (EPEC) and was 
found to encode all known determinants involved in the so-called "attaching and 
effacing" (AE) lesion formation. This region was therefore referred to as "locus of 
enterocyte effacing" (LEE). Despite the fact that uropathogenic and enteropathogenic E. 
coli cause completely different infectious diseases, Pai I of the uropathogenic strains and 
the LEE locus of EPEC are inserted at exactly the same positions into the E. coli 
chromosome. 

While some authors support a definition of pathogenicity islands which 
necessarily includes its chromosomal location, others have extended the concept to 
blocks of virulence genes, regardless of their location in chromosomes, plasmids or 
phages. The fact that, on one hand, phages and plasmids can easily insert into and excise 
from the chromosome and, on the other, that cryptic origins of plasmid replication, or 
phage related sequences were detected in Pais, prompted us to adopt the latter and less 
restrictive definition. 
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The pathogenicity islands <PA.s) which code for a type m secretion systenr 

* , Tor.H U Class I encompasses the genes 

+ w Hivifie into two classes, I and 11. Liass 1 cuwu F 

, the aenes encoding seated effector proteinsnBo* Yers,r» a JcrD and 
" — ^ 8 , Z pie — of cias,. dete-inants is noVwell 

- » — ° f - ■ ™ * ^ 38 bem6 

between class ^ ^ ^ 

sr^nr- . ~ ^ - - * - 

y«:C/, y«AO of sequence similarity (Hueck, 1998). 

, i „f Penes fclass II) codes*for proteins which constitute the 

present the^best biologi^, vacci^^^ 

r H tw the clustering of class 1 and class II genes inside 
The inventors have realised that the clustering u 

• • ,and offers the opportunity of conveniently finding and 

^= r. ; . ~ — — -.— 

using a known sequence of one of their numerous orthologues. 



25 



30 



Bordetella pertussis 

Whooping cough is a disuse caused by infection by Bor^P—^ 
■ Stating human disease partly in young chiidren. Ahhough whole 
serious and debilitating numai r ,w»se there 

ce „ and aceilular vaccines arc avaiiable tha, arc effective agarnst the drsease, 
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remains a need for the identification of further highly purified pertussis proteins that 
could be used in a more efficacious pertussis vaccine. 

Although many pertussis virulence associated factors are known such as pertussis 
5 toxin, filamentous haemagglutinin, pertactin, which have been included in various 
acellular vaccines, there is no convenient genetic method for identifying further virulence 
factors using the pertussis genome (short of laboriously sequencing the whole genome). 
Although class I type III secretion system virulence genes have recently been shown to 
exist in B. bronchiseptica and B. pertussis (Yuk et aL, 1998), there has been no complete 
10 analysis of a pathogenicity island in Bordetella, and the identity and characterisation of 
effector genes within such a pathogenicity island has been unknown up until the present 
invention. 

1 5 SUMMARY OF THE INVENTION 

In one aspect, the invention relates to a method for the identification of new 
virulence genes in bacterial strains containing a type III secretion system. In particular, 
the invention allows the identification of the effector virulence genes associated within a 

20 pathogenicity island containing the genes for the type III secretion system. Another 
aspect of the invention a method for the identification of pathogenic bacterial strains 
containing a type III secretion system. Another aspect of the invention relates to 
Bordetella pertussis BopN, Orfl, Orf2, Orf3 3 Orf4, Orf5 5 Orf6, Orf7, OrfS, Orf9, OrflO, 
Orfll, Orfl 2, Orfl 3, Orfl 4, Orfl 5 effector proteins, and the respective polynucleotide 

25 sequences encoding them. 

Although the general concepts of type III secretion systems and pathogenicity 
islands have been reported, the problem of how simply and reliably to identify whether 
any given organism has such cell machinery has not been accomplished until now. Such 
30 a method is extremely useful to establish whether a given strain has a type III secretion 
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system within a pathogenicity island, to characterise unknown virulence genes within the 
pathogenicity island, and to use in quick diagnostic methods for determining whether a 
cultured bacterial strain containing a type III secretion system is patlfogenic. 

5 In the present invention, a novel, general method is described to achieve the 

above aims. More specifically, the invention utilises a method4hat employs ideally- 
suited primers designed specifically from the sequence of the virulent Yersinia 
enterocolUica IcrD gene as a target sequence. The presence of a type III secretion system 
within a pathogenicity island in Bordetella pertussis was discovered, and every gene 

10 within the pathogenicity island was characterised. 
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brief Ym€mmm»mm»wsm^^ 

Fig^l/'N^ Cl ° ned bP 

amplicon^^^ 

PGR; and^ene.lib W scr^nmg^re alWeri^om^sequence, and hsted 
specifieallyain -Table*. 

Fig. 2. PileUp figure from the deduced amino acid sequences homologous to 
Yersinia LcrD. Abbreviations: BbuFlhA = Borrelia burgdorferi FlhA; TpaFlhA = 
Treponema pallidum FlhA; BsuFlhA - Bacillus subtilis FlhA; CjeFlbA - Campylobacter 
jejuni FlbA; HpyFlhA - Helicobacter pylori FlhA; EcoFlhA - Escherichia coli FlhA; 
StyFlhA = Salmonella typhimurium FlhA; YenFlhA = Yersinia enterocolUica FlhA; 
PmiFlhA = Proteus mirabilis Tim CcrFM- Caulobacter creseentusXm EcoFhiA = 
Escherichia FhiA; EamHrpI = Ervinia amylovora HrpI;«I = Pseudomonas 
syringae- ECEPSepA : = Enteropathogenic Escherichia***** StySsaV - 
Salmonella typhimurium SsaV; RsoHrpO = Ralstonia solanacearum HrpO; XcaHrpC2 = 
Xanthomonas campestris HrpC2; SflMxiA = Shigella Flexneri MxiA; StylnvA = 
Salmonella typhimurium InvA; PaePcrD = Pseudomonas aeruginosa PcrD; YenLcrD - 
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Yersinia enterocolitica LcrD; BpeBcrD = Bordetella pertussis BcrD; CpsTtsB = 
Chlamydia psittaci TtsB. 

Fig. 3. Organization of the Bordetella pertussis pathogenicity island (Pai). Four 
5 house keeping genes (hatched boxes) and the transposase gene of IS 481 (black box) are 
surrounding the Pai. The Pai consists of genes coding for determinants involved in the 
secretory apparatus and its regulation (class I genes, in grey boxes) as well as ORFs 
which putitively code for effector proteins (class II genes, in white boxes). Letters 
indicate the respective class I bsc genes whereas numbers correspond to the class II 
1 0 ORFs listed in Table 3 . 

Fig. 4. PileUp figure from the deduced amino acid sequences homologous to 
Yersinia YscU. Abbreviations: BbuFlhB = Borrelia burgdorferi FlhB; TpaFlhB = 
Treponema pallidum FlhB; EcoFlhB - Escherichia coli FlhB; Sty FlhB = Salmonella 

15 typhimurium FlhB; PmiFlhBpart = partial Proteus mirabilis FlhB; YenFlhB = Yersinia 
enterocolitica FlhB; BsuFlhB = Bacillus subtilis FlhB; HpyFlhB = Helicobacter pylori 
FlhB; AtuFlhB = Agrobacterium tumefaciens FlhB; CcrPodW = Caulobacter crescentus 
PodW; SflSpa40 = Shigella flexneri Spa40; StySpaS = Salmonella typhimurium SpaS; 
EcoEscU = Escherichia coli EscU; StySsaU = Salmonella typhimurium SsaU; BpeBscU 

20 = Bordetella pertussis BscU; Yen YscU = Yersinia enterocolitica YscU; RsoHrpN = 
Ralstonia solanacearum HrpN; XcaOrfDpart = partial Xanthomonas campestris OrfO; 
EamHrcU = Erwinia amylovora HrcU; EheHrcUpart = partial Erwinia herbicola HrcU; 
PsyHrpY = Pseudomonas syringae HrpY; CpsOrfl = Chlamydia psittaci Orfl . 

25 Fig. 5. The DNA sequence of the Bordatella pertussis genome comprising the 

type III secretion system pathogenicity island. Reference should be made to tables 2, 3, 
and 4 and Fig. 3 for information regarding open reading frames. 

30 
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DESCRIPTION OF THE INVENTION 

Typ e .1, secretion systems identified to date are encoded by either chromosal or 
p.asmidic pathogenicity isiand genes. However, no where in the prior art was., reai.se 
He conservation of g enes encoding Cass I components, of type* see«ern 
.d the c,uste*g.of the* genes with effector protein ^ 

proteins wouid be potential* valuable in bo* vaccina! and diagnosfc fields. 

Although the known sequence of a gene encoding any conserved (Cass .) type HI 
se cre,io„ machinery protein can be used in performing this invention, the ,crO gene » 
Xed The chosen gene wiH ac, as a targe, for detecting unidentified pathogeny 
£T£ — bacterial specie, The fcr Z, gene from — is prefer^ ,, codes 

Cassation of the LoD famiiy members can be split into two m,n — — 
0 interestingly can be corrected with the tactions assigned to these protems f «fa 
« One subfamiiy encompasses ai, the motility— proteins, wh. e *e 

- > „ • t „i n qas\ & Boedanove et al. (1996)). lnus, u <m 
in Fig. 2 (and mentioned in Gyn et al. (1995) & m>g 

25 classified a. virulence- a fl age (l ar.gene.^.the^ogen,et,y ,slan - 

Simple test would therefore define whether the search for other vtruience genes on 
the 0Hmmam»*«**^< mlt *■**!«■*■• 

The preferred method for identifying unknown pathogenicity islands comprising 
30 a type III secretion system is by: 
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i) identifying two highly conserved regions of the target protein sequence (preferably 
of LcrD). Preferably, both regions should contain conserved amino acids which are 
encoded by the fewest number of codon possibilities e.g. Methionine (ATG being 
the only possibility) or Tryptophan (TGG being the only possibility). This 

5 minimises the number of permutations in both degenerate primer sets that are 

designed in the next stage of the process, thus ensuring a greater probability that 
each primer set will specifically anneal to the unknown /crD-equivalent gene 
(thereby minimising background non-specific interactions). Most preferably, regions 
should also be chosen that are clearly distinguishable from the paralogue flhA 
10 flagellar genes, present in all flagellated bacterial strains. 

ii) designing a degenerate set of primers for both of the chosen regions such that a) the 
primers are at least 15 bases long, preferably 20-30 bases long, and still more 
preferably 21-23 bases long, b) they are degenerate at bases that can be more than 
one type of nucleotide whilst still encoding the same amino acid (due to the 

15 degeneracy of codon usage for amino acids), but no more degenerate than is 

required to cover all permutations for the amino acid region selected, and c) the 
primer set that encodes the more N-terminal region of the chosen protein should 
correspond to the coding strand of its corresponding double-stranded DNA 
sequence, and the set that encodes the more C-terminal region should correspond to 

20 the complementary strand of the corresponding double-stranded DNA sequence. 

iii) synthesising the degenerate primer sets of step ii) using conventional DNA synthesis 
methods well known in the art. 

iv) purifying the primer sets of step iii) 

v) adding both the primer sets and a sample containing nucleic acid from a bacterial 
25 strain (preferably a cell sample of the bacterial species itself) together in appropriate 

quantities and in an appropriate buffer in order to perform a polymerase chain 
reaction (PCR) 

vi) performing a PCR reaction in order to amplify the region of the gene between the 
two primers (conditions for performing the PCR reaction can be optimised using 

30 techniques well known in the art) 
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vii) observing the reaction products on a gel (preferably an agarose gel) for an amplified 
product of the size expected; if no such product is present, the bacterial strain is 
unlikely to use a type III secretion system; if such a product is present, the bacterial 
strain is likely to have a type III secretion system, and is likely to be pathogenic. 

5 

The^preferred* method for confirming,, that the* amplified product actually 
corresponds to a virulence gene is by carrying out steps i)-vii) above (where the target 
protein is LcrD) and then: 

viii) optionally separating the product of correct size from any background products of 
10 incorrect size by removing the correct band from the gel, purifying the product by 

conventional means, and amplifying the product once more with the two degenerate 
primer sets in another PCR reaction (under preferably more stringent PCR 
conditions) fthis'Stepas required should the»produet of step*yii) not be pure enough 
for direet'eloningdu! 

15 ix) inserting me DM&*f^ capable of 

being sequm(^i^aridii$&m^^ n S^ e ^ a ^W n ^' 

x) comparing to known members of 

the LcrD/FlbF*fami^ bein S P 81 * of 

either a virulence or a flagellar gene. 

20 

And optionally: 

xi) using the internal sequence of the fragment to design primers that are the exact 
sequence of, and specific to, the unknown /cr£>-equivalent gene. 

xii) using the primers of xi) firstly to screen a genomic library of the organism for 
25 positive'clone* •* 

xiii) isolating the clonessof xii), and sequence one or more of said clones 

xiv) scanning the sequencevof one*elone -(and overlapping sequenees<of other clones) to 
search for an open reading frame which is approximately the^same size as IcrD 
(approximately 2100bp), and encodes a protein homologous to LcrD 
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xv) ascertaining whether the LcrD-equivalent protein is more homologous with the flbF 
(flagellar protein secretion) gene family or the IcrD (type III secretion system 
pathogenicity island) gene family. 

The preferred method for characterising the whole pathogenicity island and 
defining unidentified virulence effector genes is by carrying out steps i)-xv) above 
(where the target protein is LcrD) and then: 

xvi) if the sequence is more homologous with the IcrD gene family, designing primers at 
either extreme of the gene sequence already ascertained, and scanning and 
sequencing the genomic library (using a standard chromosome walking strategy - 
where the insert boundaries of an original clone serves as a probe for screeing and 
cloning adjacent regions) to sequence eventually the whole of the pathogenicity 
island (both boundaries of which will be defined by the presence of either direct or 
inverted repeats, or insertion sequences, or the presence of house-keeping genes) 

xvii) defining unidentified virulence effector genes within the sequenced pathogenicity 
island 

xviii) cloning, expressing and characterising the virulence genes of xvii) which encode 
virulence effector proteins of the organism 

Definitions 

"Pertussis pathogenicity proteins" refers generally to polypeptides having the 
amino acid sequence encoded by the genes defined in tables 2 and 3, or an allelic variant 
thereof. These proteins are: BcrD, BcrH, BscC, BscD, BscE, BscF, BscI, BscJ, BscK, 
BscL, BscN, BscO, BscP, BscQ, BscR, BscS, BscT, BscU, BscV, BrpL, BopN, Orfl, 
Orf2, Orfi, Orf4, Orf5, Orf6, Orf7, Orf8, Orf9, OrflO, Orfll, Orfl2, Orfl3, Orfl 4, 
Orfl 5. 

"Pertussis pathogenicity genes" refers to polynucleotides having the nucleotide 
sequence defined in tables 2 and 3, or allelic variants thereof and/or their complements. 
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These genes are: bcrD, bcrH, bscC, bscD, bscE, bscF, bscl, bscJ, bscK, bscL, bscN, 
bscO, bscP, bscQ, bscR, bscS, bscT, bscU, bscV, brpL, bopN, orfl, orJ2, or/3, orf4, orf5, 
orf6, orfl, orjB, or/9, orflO, orfll, orfl2, or/13, or/14, or/15. 

"Polypeptide" refers to any peptide or protein comprising**™ or^nore amino 
acids joined to each other^by peptide bonds or m odif 1 ed,peptid.e ? bond S , i.e. peptide 
isosteres "Polypeptide" refers to both short chains, commonly referred to as peptides, 
oligopeptides or oligomers, and to longer chains, generally referred to as proteins. 
Polypeptides may contain amino acids other than the 20 gene-encoded ammo acids. 
"Polypeptides" include amino acid sequences modified either by natural processes, such 
as postradiational processing, or by chemical modification techniques which are well 
known in the art. Such modifications are well described in basic texts and m more 
detailed monogr^as^as in a voluminous research literature. Modifications can 
occur*any W he^ 

cha ins,and^e^^ °' 
modification^^ ^ m * 

polypeptide* Also* a ^»^^^^^'«^^ t ^ S ^ f modlficatlonS - 

Polypept^^ 

or without branching. Cyclic, branched and branched cyclic polypeptides may result 
from posttranslational natural processes or may be made by synthetic methods. 
Modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent 
attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a 
nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, 
covalent attachment of phosphotidylinositol, cross-linking, cyclization, disulfide bond 
formationrdeme^^ of CySt ™> 

formation of. pyroglutamate, formylation, gamma^carboxylation, glycosylate, GPI 

anchor^fo^^ ° Xldatl ° 11 ' 
proteolytic processing, phosphorylation, prenylatiqn, racemizaticm, selenoylation, 
sulfation, transfer-RNA mediated addition of amino acids to proteins such as 
) arginylation, and ubiquitination. See, for instance, PROTEINS - STRUCTURE AND 
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MOLECULAR PROPERTIES, 2nd Ed., T. E. Creighton, W. H. Freeman and 
Company, New York, 1993 and Wold, F., Posttranslational Protein Modifications: 
Perspectives and Prospects, pgs. 1-12 in POSTTRANSLATIONAL COVALENT 
MODIFICATION OF PROTEINS, B. C. Johnson, Ed., Academic Press, New York, 
5 1983; Seifter et al, "Analysis for protein modifications and nonprotein cofactors", Meth 
Enzymol (1990) 182:626-646 and Rattan et al, "Protein Synthesis: Posttranslational 
Modifications and Aging", Ann NY Acad Sci (1992) 663:48-62. 

"Polynucleotide" generally refers to any polyribonucleotide or 
10 polydeoxribonucleotide, which may be unmodified RNA or DNA or modified RNA or 
DNA. "Polynucleotides" include, without limitation single- and double-stranded DNA, 
DNA that is a mixture of single- and double-stranded regions, single- and double- 
stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid 
molecules comprising DNA and RNA that may be single-stranded or, more typically, 
15 double-stranded or a mixture of single- and double-stranded regions. In addition, 
"polynucleotide" refers to triple-stranded regions comprising RNA or DNA or both RNA 
and DNA. The term polynucleotide also includes DNAs or RNAs containing one or 
more modified bases and DNAs or RNAs with backbones modified for stability or for 
other reasons. "Modified" bases include, for example, tritylated bases and unusual bases 
20 such as inosine. A variety of modifications has been made to DNA and RNA; thus, 
£ "polynucleotide" embraces chemically, enzymatically or metabolically modified forms 

of polynucleotides as typically found in nature, as well as the chemical forms of DNA 
and RNA characteristic of viruses and cells. "Polynucleotide" also embraces relatively 
short polynucleotides, often referred to as oligonucleotides. 

25 

"Variant" as the term is used herein, is a polynucleotide or polypeptide that 
differs from a reference polynucleotide or polypeptide respectively, but retains essential 
properties. A typical variant of a polynucleotide differs in nucleotide sequence from 
another, reference polynucleotide. Changes in the nucleotide sequence of the variant 
30 may or may not alter the amino acid sequence of a polypeptide encoded by the reference 
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10 



15 



polynudeotide. Nucleotide changes may result i» amino acid substitutions, add„,o„s, 
deletions, lusions and tnancations in me polypeptide encoded by «he reference sconce, 
as discussed below. A typical variant of a polypeptide differs in amino actd sequence 
from another, reference polypeptide. GeneraUy, differences are .im.ted so thaM* 
fences of me reference polypeptide and the variant CoseLy simdar overall and, tn 
mly. regions, iden^.A variant and reference polypeptide may differ m ammo ac.d 
Znce by one or more substitutions (preferab.y conservative), additions, delettons m 
^ combination. A substituted or inserted amino acid residue may or may no, be « 
en oded by the genetic code. A variant of a polynucleotide or poiypept.de may be 
Iturany occurring such as an alielic vanant, or it may be a variant that .s no, known , 
occur Iturally. Non-naturally occurring variants of polynucleotides and polypeptides 
17be made by mutagenesis technic or by direct synthesis. Variant, should re*tn 
Z „r more of the bio,og,ca, activities of the reference- polypeptide. For insumce, £ 
Zm have -similar antigenic or immunogenic activities as -the-reference polypepttd. 
^H*™*-*** .immunoblo, experiments, preferably usm 

by measurinrantibody- -response, (using polyclonal sera generated agamst the vanant 
/oiyp^ag^^^ 

a variant would retain all of the above biological aettvmes. 

■•identity" is a measure of the identity of nucleotide sequences or amino acid 
sequences In general, the sequences are aligned so that the highest order match ,s 
TIT -.den ity"^ - has an art-recognized meaning and can be ca,cu,a,ed usmg 
obtamed. Identity p (COMPUTAT IONAL MOLECULAR BIOLOGY, 

published techniques. See, e.g.. (COMPUTAT ^^C: 

< I esk A M ed., Oxford Universtty Press, New York, 1955, d 

« ORMA^CS AND GENOME PROIECTS, Smith, D.W, ed., Academic Press New 
V COMPUTER ANALYSIS OF SEQUENCE DAT>A,PART I, Gnffln, A.M., 

York, 1993, COMfUtcr. ™oUENCE ANALYSIS IN 

and Griffm, H.G., ed,, Humana Press, New Jersey, 1994, ^NCE 
MOLECULAR BIOLOGY, von Heijne, O, Academic Press, 1987; and SEQUENCE 

„ UlYSIS PRIMER, Oribskov, M. and Devereux, I., ed,, M Stockton Press, New 
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York, 1991). While there exist a number of methods to measure identity between two 
polynucleotide or polypeptide sequences, the term "identity" is well known to skilled 
artisans (Carillo, H., and Lipton, D., SUM J Applied Math (1988) 48:1073). Methods 
commonly employed to determine identity or similarity between two sequences include, 
5 but are not limited to, those disclosed in Guide to Huge Computers, Martin J. Bishop, 
ed., Academic Press, San Diego, 1994, and Carillo, H., and Lipton, D., SIAM J Applied 
Math (1988) 48:1073. Methods to determine identity and similarity are codified in 
computer programs. Preferred computer program methods to determine identity and 
similarity between two sequences include, but are not limited to, GCG program package 
10 (Devereux, J., et al., Nucleic Acids Research (1984) 12(1):387), BLASTP, BLASTN, 
FASTA (Atschul, S.F. et al, J Molec Biol (1990) 215:403). Most preferably, the 
program used to determine identity levels was the GAP program, as was used in the 
Examples below. 

15 As an illustration, by a polynucleotide having a nucleotide sequence having at 

least, for example, 95% "identity" to a reference nucleotide sequence is intended that the 
nucleotide sequence of the polynucleotide is identical to the reference sequence except 
that the polynucleotide sequence may include on average up to five point mutations per 
each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a 

20 polynucleotide having a nucleotide sequence at least 95% identical to a reference 
nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be 
deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of 
the total nucleotides in the reference sequence may be inserted into the reference 
sequence. These mutations of the reference sequence may occur at the 5' or 3' terminal 

25 positions of the reference nucleotide sequence or anywhere between those terminal 
positions, interspersed either individually among nucleotides in the reference sequence or 
in one or more contiguous groups within the reference sequence. 



30 Polypeptides of the invention 
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In one aspect, the present invention relates to Bordetella pathogenicity proteins (or 
polypeptides). The Bordetella pathogenicity polypeptides include the polypeptides 
encoded by the genes defined*in tables 2 and 3; as well as polypeptides comprising the 
amino acid sequence encoded by the genes defined in tables 2 and 3; and polypeptides, 
comprising the amino acid sequence which have at least 75% identity- to that encoded by 
the genes defined in tables 2 and 3 over their entire length* and*preferably at least 80% 
identity, and more preferably at least 90% identity. Those with 95-99% identity are 
highly preferred. 

The Bordetella pathogenicity polypeptides may be in the form of the "mature" 
protein or may be a part of a larger protein such as a fusion protein. It may be 
advantageous to include an additional amino acid sequence which contains secretory or 
leader sequences, pro-sequences? sequences^ which aid^in purification such as multiple 
histidine residues, or-aiMadditionali^i^^r stabaity- during^ecombinant production. 

Fragments of ^me#Bo*rdetellafep,a^ in me 

invention. A fragment is a-pol^eptide»ha«i^^^ani»i0|acid seqqgi»ee-that is the same as 
part, but not all, of the*amimo*acid»sequenee*,of me^aforementioned^Bdrdetella pathogenicity 
polypeptides. As with Bordetella pathogenicity polypeptides, fragments may be "free- 
standing," or comprised within a larger polypeptide of which they form a part or region, 
most preferably as a single continuous region. Representative examples of polypeptide 
fragments of the invention, include, for example, fragments from about amino acid number 
1-20, 21-40, 41-60, 61-80, 81-100, and 101 to the end of Bordetella pathogenicity 
polypeptide. In this context "about" includes the particularly recited ranges larger or 
smaller by several, 5, 4, 3, 2 or 1 amino acid at either^extreme-or at both'extremes. 

Preferred fragments include r for example, truncation polypeptides-shaving the amino 
acid sequence of Bordetella pathogenicity polypeptides, except .for deletion of a continuous 
series of residues that includes the amino terminus, or a continuous series of residues that 
includes the carboxyl terminus and/or transmembrane region or deletion of two continuous 
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series of residues, one including the amino terminus and one including the carboxyl 
terminus. Also preferred are fragments characterized by structural or functional attributes 
such as fragments that comprise alpha-helix and alpha-helix forming regions, beta-sheet 
and beta-sheet-forming regions, turn and turn-forming regions, coil and coil-forming 
5 regions, hydrophilic regions, hydrophobic regions, alpha amphipathic regions, beta 
amphipathic regions, flexible regions, surface-forming regions, substrate binding region, 
and high antigenic index regions. Other preferred fragments are biologically active 
fragments. Biologically active fragments are those that mediate Bordetella pathogenicity 
protein activity, including those with a similar activity or an improved activity, or with a 
10 decreased undesirable activity. Also included are those that are antigenic or immunogenic 
in an animal, especially in a human. 

Preferably, all of these polypeptide fragments retain the biological activity (for 
instance antigenic or immunogenic) of the Bordetella pathogenicity protein, including 

15 antigenic activity. Variants of the defined sequence and fragments also form part of the 
present invention. Preferred variants are those that vary from the referents by conservative 
amino acid substitutions- i.e., those that substitute a residue with another of like 
characteristics. Typical such substitutions are among Ala, Val, Leu and He; among Ser and 
Thr; among the acidic residues Asp and Glu; among Asn and Gin; and among the basic 

20 residues Lys and Arg; or aromatic residues Phe and Tyr. Particularly preferred are variants 
in which several, 5-10, 1-5, or 1-2 amino acids are substituted, deleted, or added in any 
combination. Most preferred variants are naturally occurring allelic variants of Bordetella 
pathogenicity polypeptide present in strains of Bordetella pertussis. 

25 The Bordetella pathogenicity polypeptides of the invention can be prepared in any 

suitable manner. Such polypeptides include isolated naturally occurring polypeptides, 
recombinants produced polypeptides, synthetically produced polypeptides, or polypeptides 
produced by a combination of these methods. Means for preparing such polypeptides are 
well understood in the art. 

30 
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Polynucleotides of the invention 

Another aspect of the invention relates to Bordetella pathogenicity polynucleotides. 
Bordetella pathogenicity polynucleotides include isolated polynucleotides which encode 
the Bordetella pathogenicity polypeptides and fragments respectively, and polynucleotides 
closely related thereto. More specifically, Bordetella pathogenicity polynucleotides of the 
invention include a polynucleotide comprising the nucleotide sequence of genes defined in 
table 2 or 3, encoding a Bordetella pathogenicity polypeptide. Bordetella pathogenicity 
polynucleotides further include a polynucleotide comprising a nucleotide sequence that has 
at least 75% identity over its entire length to a nucleotide sequence encoding the Bordetella 
pathogenicity polypeptide encoded by the genes defined in tables 2 and 3, and a 
polynucleotide comprising a nucleotide sequence that is at least 75% identical to that of 
the genes defined in tables 2 and 3. In this regard, polynucleotides at least 80% identical 
are particularly preferred- and those with at least 90% are especially preferred. 
Fuilhei^OEe^thQse*^ least 98-99% 

are mosUhighly ^ prefemfedf^ith at%ast*09^%eing^ included under 

Bordetella^ »av nuGle0tideMseq,ugnee^which has sufficient 

identity**© *a nueleotide^sequene^^ to hybridize under 

conditdonsmaseablefcforia^ also 
provides polynucleotides which are complementary to such Bordetella pathogenicity 
polynucleotides. 

The nucleotide sequence encoding Bordetella pathogenicity polypeptide encoded 
by the genes defined in tables 2 and 3 may be identical to the polypeptide encoding 
sequence contained in the genes defined in tables 2 or 3, or it may be a sequence, which as 
a result of the redundancy (degeneracy) of the genetic eodef^also^iieodes the polypeptide 
encoded by the genes defined in tables 2 and 3 respectively. 

When the -polynucleotides of the- invention^ » are ^used ^or^ the ^recombinant 
production of Bordetella pathogenicity polypeptide, the polynucleotide may include the 
coding sequence for the mature polypeptide or a fragment thereof, by itself; the coding 
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sequence for the mature polypeptide or fragment in reading frame with other coding 
sequences, such as those encoding a leader or secretory sequence, a pre-, or pro- or prepro- 
protein sequence, or other fusion peptide portions. For example, a marker sequence which 
facilitates purification of the fused polypeptide can be encoded. In certain preferred 
embodiments of this aspect of the invention, the marker sequence is a hexa-histidine 
peptide, as provided in the pQE vector (Qiagen, Inc.) and described in Gentz et ai, Proc 
Natl Acad Sci USA (1989) 86:821-824, or is an HA tag, or is glutathione-s-transferase. The 
polynucleotide may also contain non-coding 5' and 3' sequences, such as transcribed, non- 
translated sequences, splicing and polyadenylation signals, ribosome binding sites and 
sequences that stabilize mRNA. 

Further preferred embodiments are polynucleotides encoding Bordetella 
pathogenicity protein variants comprising the amino acid sequence of the Bordetella 
pathogenicity polypeptide encoded by the genes defined by tables 2 and 3 respectively in 
which several, 10-25, 5-10, 1-5, 1-3, 1-2 or 1 amino acid residues are substituted, deleted or 
added, in any combination. Most preferred variant polynucleotides are those naturally 
occurring Bordetella pertussis sequences that encode allelic variants of the Bordetella 
pathogenicity proteins in B. pertussis, 

20 The present invention further relates to polynucleotides that hybridize to the herein 

above-described sequences. In this regard, the present invention especially relates to 
polynucleotides which hybridize under stringent conditions to the herein above-described 
polynucleotides. As herein used, the term "stringent conditions" means hybridization will 
occur only if there is at least 80%, and preferably at least 90%, and more preferably at least 

25 95%, yet even more preferably 97-99% identity between the sequences. 

Polynucleotides of the invention, which are identical or sufficiently identical to a 
nucleotide sequence of any gene defined in tables 2 and 3 or a fragment thereof, may be 
used as hybridization probes for cDNA and genomic DNA, to isolate full-length cDNAs 
30 and genomic clones encoding Bordetella pathogenicity polypeptides respectively and to 
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isolate cDNA and genomic clones of other genes (including genes encoding homologs and 
orthologs from species other than Bordetella pertussis) that have a high sequence similarity 
to the Bordetella pathogenicity genes. Such hybridization techniques are known to those of 
skill in the art. Typically these nucleotide sequences are 80% identical, preferably 90% 
identical, more preferably 95% identical to that-of the referent. The-probes generally will 
comprise at least 15 nucleotides. Preferably, such probes will have, at least 30 nucleotides 
and may have at least 50 nucleotides. Particularly preferred probes will range between 30 
and 50 nucleotides. In one embodiment, to obtain a polynucleotide encoding Bordetella 
pathogenicity polypeptide, including homologs and orthologs from species other than 
Bordetella pertussis, comprises the steps of screening an appropriate library under stringent 
hybridization conditions with a labeled probe having a nucleotide sequence contained in 
one of the gene sequences defined by tables 2 and 3, or a fragment thereof; and isolating 
full-length cDNA and genomic clones containing said polynucleotide sequence. Thus in 
another aspect, Bordetella pathogenicity polynucleotides of the present invention further 
include -a nucleotide -sequence comprising a nucleotide, sequence that hybridize under 
stringent condition to,a nucleotide sequence, having a nucleotide sequence contained in one 
of the genes defined by table 2 and 3, or a fragment thereof. Also included with Bordetella 
pathogenicity polypeptides are polypeptides comprising amino acid sequences encoded by 
nucleotide sequences obtained by the above hybridization conditions. Such hybridization 
techniques are well known to those of skill in the art. Stringent hybridization conditions 
are as defined above or, alternatively, conditions under overnight incubation at 42°C in a 
solution comprising: 50% formamide, 5xSSC (150mM NaCl, 15mM trisodium citrate), 50 
mM sodium phosphate (pH7.6), 5x Denhardt's solution, 10 % dextran sulfate, and 20 
microgram/ml denatured, sheared salmon sperm DNA, followed by washing the filters in 
O.lx SSC at about 65°C. 

The polynucleotides and polypeptidesfof the present inventiononay be employed as 
research reagents and materials for discovery of treatments and diagnostics to animal and 
human disease. 
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Diagnostic Assays 

This invention also relates to the use of Bordetella pathogenicity polypeptides, or 
Bordetella pathogenicity polynucleotides, for use as diagnostic reagents. Detection of 
Bordetella pathogenicity polypeptides will provide a diagnostic tool that can add to or 
5 define a diagnosis of B. pertussis disease, among others. 

Materials for diagnosis may be obtained from a subject's cells, such as from blood, 
urine, saliva, tissue biopsy. 

10 Thus in another aspect, the present invention relates to a diagonostic kit for a 

disease or suspectability to a disease, particularly B. pertussis disease, which comprises: 

(a) a Bordetella pathogenicity polynucleotide, preferably the nucleotide sequence of one 
of the gene sequences defined by tables 2 and 3, or a fragment thereof ; 

(b) a nucleotide sequence complementary to that of (a); 

15 (c) a Bordetella pathogenicity polypeptide, preferably the polypeptide encoded by one of 
the gene sequences defined in tables 2 and 3, or a fragment thereof; 

(d) an antibody to a Bordetella pathogenicity polypeptide, preferably to the polypeptide 
encoded by one of the gene sequences defined in tables 2 and 3; or 

(e) a phage displaying an antibody to a Bordetella pathogenicity polypeptide, preferably 
20 to the polypeptide encoded by one of the gene sequences defined in tables 2 and 3. 

It will be appreciated that in any such kit, (a), (b), (c), (d) or (e) may comprise a 
substantial component. 

25 Vaccines 

Another aspect of the invention relates to a method for inducing an 
immunological response in a mammal which comprises inoculating the mammal with 
Bordetella pathogenicity polypeptide or epitope-bearing fragments, analogs, outer- 
membrane vesicles or cells (attenuated or otherwise), adequate to produce antibody and/or 
30 T cell immune response to protect said animal from B. pertussis disease, among others. In 
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ocular me invention re,a.es .0 .he use of BordeteUa paucity polypeptides 
ncoded by U>e genes defined in table 3 - the effector proteins. Ye. another aspect of the 
invention rela.es to a method of inducing immunologic^ response in a mamma, whtch 
comprises, delivering Hordeola pamogeniciry polypeptide via a vector dating 
5 expression of BordeteUa pathogenicity polynucleotide in v,Vo in order to induce such an 
immunological response to produce antibody to protect said animal from dtseases. 

A further aspect of the invention relates to an immunological composition or 
vacci „e formulation which, when induced i„.o a mammalian hos, induces an 
„ immunological response in that mamma, to a Borde.Ua pathogeny po.ypep de , 
Z*J« one encoded by a gene defined in table 3, wherein the composition 
Iprises a BordeteUa pathogenicity gene, or Bordetella pathogenicity polypeptide or 
TpZe-bearing fragments, .alogs, outer-membrane vesicles or cel. (atienuated^ 
lerwise,, ■ Thfevacinerfbrmulatio n^furmer^omprise^ ,su«able carter Th 

l ly orpL,eraHy« 

etc. Section,, F*»-*~^l— admin*— ude a,»eous . 

ta Jostats and soln.es which render the formulation isotonic w.th the blood of m 
2 „ Ipienr and ao.ueous and no„-a q ueous sterile suspensions whtch may tncMe 

Z Z* gagentsorthicteningagents. The formulations may be presen.ed in nntt^se 
Ildose containers, for example, sealed ampoules and vials and may be stored ,„ a 

varied condition rearing only me addition of the sterile liauid carrier —ly 

.I to use. The vaccine formulation may also indude adjuvant sy,ems for enhancmg 
25 I—enicr* of me —ion, such as oil-in water W -d Cher sysKms 

lIlLinthL. The dosage will depend on me specific acivi^ of me vaccme and can 

be readily determined by routine experimentation. 
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Yet another aspect relates to an immunological/vaccine formulation which 
comprises the polynucleotide of the invention. Such techniques are known in the art, see 
for example Wolff et al, Science, (1990) 247: 1465-8. 

5 EXAMPLES 

The examples below are carried out using standard techniques, which are well 
known and routine to those of skill in the art, except where otherwise described in detail. 
The examples illustrate, but do not limit the invention. 

10 Example 1 : A type III secretion system is present in a pathogenicity island in Bordetella 
pertussis. 

The presence of a IcrD homologous gene in the Bordetella pertussis genome was 
investigated by polymerase chain reaction (PCR). The primers used (oligos 95080 and 
95081 shown in Table 1) were degenerate oligonucleotides corresponding to highly 

15 conserved regions of the amino acids sequences of the LcrD/FlbF family of proteins. 
These primers were also designed to favour the amplification of virulence genes instead 
of their paralogue JlhA or flbF flagellar genes, present in flagellated bacterial strains. The 
presence of the 3' triplet CAT in oligonucleotide 95081 is a determinant - indeed when 
multiple sequence analysis is done using known homologous sequences (database 

20 searching was done with either the FASTA and TFASTA programs of the GCG9 
package, or with BLASTN, BLASTP and BLASTX programs, and alignments were 
carried out with the PILEUP program from the GCG9 package) it could be seen that the 
CAT triplet codes for a methionine which is exclusively present in virulence sequences 
while absent in the flagellar ones. 

25 

When analysed on agarose gel, the PCR product appeared as a heterogeneous mix of 
fragments, one of which was presenting the expected size (around 150 bp). A second 
round of amplification using the approximately 150 bp DNA as template yielded a single 
amplicon which was cloned in pCRII (obtained from Invitrogen) for further 
30 characterisation. It appeared as a 152 bp fragment whose nucleotide sequence (Fig. 1), 
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although similar to all IcrD/flbF homologous genes, shares a higher level of identity 
the virulence (/crZMike) genes. 



sequence 



features" 



IcrD corresponding 
codons 2 



95080 



95081 



95363 
95364 



96110 



GSH ATG CCW GGH AAR CAR ATG 



direct, 
degenerate 



150 to 156 



GC RTC DCC YTT DAC RAA YTT CAT 



complement, 
degenerate 



193 to 200 



CC ATC GAC GCG GAC TTG CGC G 



CGC GCC GTC CAT GGC GCC ATA 



direct, non- 
degenerate 
complement, non- 
degenerate 



157 to 164 
186 to 192 



C CGA CGC CGA CGC CGT ACG GTC 



direct, non- 
degenerate 



172 to 179 



• The letter code for nucleotide ambiguity proposed by IUB (Nomenclature Committee, 
1985 Eur. J. Biochem., 150: 1-5) was used. 

» The DNA sequence of the IcrD gene from Yersinia enterocolitis used for this work 
10 was published by Piano et al. (1991). 

To ensure that the cloned fragment was actually a B. pertussis sequence PCR was 
performed under stringent conditions with serial 10-fold dilutions of DNA from B. 
pertussis. The optimisation of stringent PCR conditions require a perfect match between 

15 template and primers. It was likely, however, that due to the degeneration of the original 
primers, the 152 bp sequence initially obtained had, at its boundaries, a few base pair 
differences with the actual B. pertussis ZcrD-like (hereafter called bcrD) sequence. A 
nested PCR approach using internal primers (oligos 95363 and 95364 Table 1) was 
therefore preferred, as primers known to be the correct B. pertussis sequence are used; A 

20 dose-response-relationship was observed between the 1 0-fold dilutions .of B. pertussis 
template DNA and the product of the nested PCR, suggesting that the 152 bp amplicon 
actually originates from the Bordetella genome. 
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Comparison of the 152 bp sequence with IcrD/flbF genes allowed us to define a 
specific DNA stretch (oligo 961 10 in Table 1) which was used as a probe for screening a 
genomic library of B. pertussis constructed in the plasmid vector pBR327 (Delisse- 
5 Gathoye et al, 1990, Infect-Jmmun. 58: 2895-905). Several positive clones were isolated 
and restriction analysis of their resident plasmids showed that they harboured 
overlapping inserts. The entire nucleotide sequence of one insert was determined, 
revealing a large open reading frame (ORF). This 2100 bp ORF encoded a 75 kDa 
polypeptide which is 59 % and 47 % identical to the yersinial proteins LcrD and FlhA 
10 respectively. Multiple amino acids comparisons of all known members of the LcrD/FlbF 
family of proteins, including the B. pertussis BcrD deduced amino acid sequence, 
showed that this sequence clearly ranked within the virulence associated determinants 
(Fig. 2). These data strongly suggest that B. pertussis possesses a type III export system, 
involved in the secretion of virulence effectors. 

15 

The B. pertussis IcrD-like nucleotide sequence (bcrD) has been submitted to 
EMBL and assigned the accession number Y13383. 

This general technique has been useful for determining the presence/absence of a 
20 type III secretion system in other bacterial strains. The human pathogens Borrelia 
burgdorferi and Helicobacter pylori were intensively screened for such a system using 
this technique. No evidence for a type III secretion system could be found. The 
subsequent publication of the genome sequences of these microorganisms has confirmed 
the absence of similar systems in these species. In contrast, the method allowed the 
25 amplification of a DNA fragment from the phytopathogen Pseudomonas corrugata, 
which clearly ranks among the virulence sequences. This technique could be applied to 
any Gram negative pathogen of medical or agronomic importance such as Neisseria spp, 
Moraxella catharalis, Vibrio cholerae, any Enterobacteriaceae, Pseudomonas spp, 
Haemophilus influenzae, Brucella spp, Francisella tularensis, Pasteurella spp, 
30 Legionella pneumophila. Even in strains that have been fully sequenced, this technique 
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can be used as a simple method for checking alternate types or strains of the same 
species. For instance, some types of pathogenic Escherichia coli harbour a type III 
secretion system whereas others do not. 

5 Example 2: Analysis^of the B. pertussis bcr<D flanking sequences* to characterise. the 
pathogenicity island andwirulence^related proteins encoded-therein 

The tendency for systematic clustering of type III encoding genes inside 
pathogenicity islands prompted the analysis of B pertussis bcrD flanking sequences. 
The whole region containing the pathogenicity island was sequenced by chromosome 
10 walking taking care to pay attention to the fact that each Pathogenicity island region must 
be represented in at least two independent clones, to avoid possible artefacts due to 
chimeric DNA inserts. This revealed clustered ORFs that could be classed in 3 
categories: class I type ORFs (table 2); class II type ORFs (table 3) - the effector proteins 
which have the best vaccinal and diagnostic properties; & insertion sequences, and ORFs 
15 homologous to house keeping genes .of other, species (table 4). Although there is no 
general rule for defining^he* boundaries of a Pathogenicity island, they can be 
demarcated with a direct or inverse repeat at one or other boundary, however the absolute 
demarcation of the boundaries can-only really be done by the detection of house keeping 
genes at the extremes of the sequence. In the present case, an insertion sequence (IS in 
20 Fig. 3) was present at the 5' end of the island (separating the virulence ORFs from the 
house keeping genes), but absent at the 3' end. In addition, the presence of house 
keeping genes (greA and ICFG-like) surrounding a locus which, according to sequence 
data, encompasses numerous virulence sequences is a good indication of the boundaries 
of the island. The complete gene organisation of the pathogenicity island is schematically 
25 represented in Figure 3. The precise definition of the PAI boundaries requires further 
experimental data, such as the characterisation of the corresponding chromosomal region 
of a Bordetella strain which is devoid^of a type III secretion system. 
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Table 2 



names 


Coding sequence 
from/to (with 
reference to Fig. 5) 


Coding 
DNA strand 


Homologous genes (from Yersinia, 
unless otherwise specified) 


Class I genes, i.e. genes coding for determinants involved in the secretory apparatus and their 
regulation 


bcrD 


50JO/1U/-0 


complement 


IcrD 


bcrH 


1 AriCkH/i /icon 
14U9 //14Soz 


direct 




bscC 


26955/2o7j / 


direct 




bscD 


7379/8659 


complement 




bscE 


7039/7338 


complement 


nunc 


bscF 


✓"'TOO /HAyin 

6783/7049 


complement 


yscF 


bscl 


17892/18218 


direct 


yscl 


bscJ 


18215/19039 


direct 


yscJ 


bscK 


19032/19694 


direct 


none 


bscL 


19664/20302 


direct 


yscL 


bscN 


20307/21641 


direct 


yscN 


bscO 


21641/22150 


direct 


yscO 


bscP 


22147/22695 


direct 


nunc 


bscQ 


22692/23771 


direct 


yscQ 


bscR 


23768/24439 


direct 


yscR 


bscS 


24445/2471 1 


direct 


yscS 


bscT 


24723/25523 


direct 


yscT 


bscU 


25520/26569 


direct 


yscU 


bscV 


26566/26964 


direct 


none 


brpL 


28778/29380 


complement 


hrpL 

(Pseudomonas syringae) 
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Names 


Coding sequence 
fromi/to^with' 
reference to Fig. 5) 


Coding 
DNAstrand 


Homologous genes (from Yersinia, 
unless otherwise specified) 


Class II ORFs which putatively code for effector proteins - 


bopN 


11906/B003 


complement 


YopN{ = IcrE) 


or/1 


6160/6747 


direct 


none 


or/2 


10752/11120 


complement 


none 


or/3 


11117/11527 


complement 


none ^ | 




11532/11909 


complement 


none 


or j J 


1 D\J\JZ.t 1 J / 


direct 


none 


orjow& 




V-l 11! Ww* ; 


none* 


OTJ A ~- 












vl 11\VV •'HOP:- 




or/9^ 




direet<p 




orflO 




directs 


{Pseudomonas aeruginosa) 


orfll 


29412/29591 


complement 


none 


orfl2 


29555/30529 


complement 


none W ^ 


orfl3 


30631/31776 


direct 


none 


or/14 


31773/33005 


complement 


none 


orfl5 


32370/33014 


direct 


none 
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Table 4 



No name 
specified 


Coding sequence 
from/to (with 
reference to Fig. 5) 


Coding 
DNA strand 


Homolgous sequences 


Insertion Sequences and house keeping genes 




711/2024 


direct 


uracil permease genes of 
numerous bacteria 




2055/3590 


complement 


chemoreceptor genes of 
numerous bacteria 




4220/4696 


direct 


greA (Escherichia coli) 




4998/5948 


complement 


transposase genes of 
numerous bacteria 




33002/34852 


complement 


ICFG gene (Synechocystis sp) 



Next to the bcrD gene, there is an open reading frame (ORF) whose deduced 
amino acid sequence shares significant similarities with the YscU protein of Yersinia spp 
(39% identity and 51% similarity) and other known YscU homologs (Fig. 4). YscU, like 
LcrD, is a component of the Yersinia type III secretion machinery involved in the 
virulence mechanisms of the bacteria. B. pertussis therefore possesses a classical type III 
secretion system which is most probably involved in pathogenicity. This latter point can 
be investigated through phentoypic analyses of mutants (see below). 

The total length of the Pai is approximately 30 to 40 kb. The DNA sequence of 
the whole region is presented in Figure 5, and is referred to in tables 2, 3, and 4. 

No homologies could be found between the B. pertussis Class II Pai DNA 
sequences and the sequences reported in the GenEMBL databases (except for those 
stated in table 3). The expressed products of these unknown genes within the Pai 
responsible for virulence, will be useful in the development of a vaccine formulation 
against pathogenic Bordetella pertussis. 
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To address the precise function of the Pai, a bcrD mutant was engineered by 
allelic exchange. In the resulting mutant, the bcrD gene was disrupted by an aphA-3 
cassette conferring kanamycin resistance? This cassette was inserted in such a way that 
translation was not interrupted, avoiding any polar effect on expression of putative 
downstream cistrons. A mutant has been isolated and its associated genotype is bcmg 
currently^analysed. ... 
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We claim: 



1. A method of identifying virulence genes from a pathogenicity island containing a type 
III secretion system from pathogenic strains of bacteria, comprising: 
5 designing degenerate PCR primers complementary to well-conserved regions 

specific to the LcrD polypeptide of Yersinia; 

amplifying the polynucleotide containing the DNA sequence between (and 
including the DNA sequence of) the primers of /cr/Mike genes present in said 
pathogenic strain of bacteria; 

10 sequencing the /crD-like gene; 

determining whether the DNA sequence is more homologous: to the virulence- 
associated family of IcrD-Uke genes, or to the flagellar-associated family of IcrD- 
like genes; and 
ifavviruler^ass®^^ 

15 identifying-gen^swithwthis^equenee::., 

2. A method of dete^iningiwhether;:a particular .baeterialxstraineharbours a type III 

secretionsystemtiip^ 

designing degenerate PCR primers complementary to well-conserved regions 

20 specific to the LcrD polypeptide of Yersinia; 

amplifying the polynucleotide containing the DNA sequence between (and 
including the DNA sequence of) the primers to determine the presence of any 

IcrD-Wke. genes in said bacterial strain; 

if amplified successfully, sequencing the /crD-like gene; and 

25 determining whether the:DNA sequence is more homologous: to the virulence- 

associated family of lcrD-\ike genes, or to the flagellar-associated family of IcrD- 
likesgeness; 

3. An isolated polynucleotide comprising a nucleotide sequence encoding the BopN 
3 0 polypeptide of Bordetella pertussis. 
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4. An isolated polynucleotide comprising a nucleotide sequence encoding the Ompl 
polypeptide of Bordetella pertussis. 

5. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp2 
polypeptide of Bordetella pertussis. 

6. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp3 
polypeptide of Bordetella pertussis. 

7. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp4 
polypeptide of Bordetella pertussis. 

8. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp5 
polypeptide of Bordetella pertussis. 

9. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp6 
polypeptide of Bordetella pertussis. 

10. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp7 
polypeptide of Bordetella pertussis. 

11. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp8 
polypeptide of Bordetella pertussis. 

12. An isolated polynucleotide comprising a nucleotide sequence encoding the Omp9 
polypeptide of Bordetella pertussis. 

13. An isolated polynucleotide comprising a nucleotide sequence encoding the OmplO 
polypeptide of Bordetella pertussis. 
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14. An isolated polynucleotide comprising a nucleotide sequence encoding the Ompll 
polypeptide of Bordetelia pertussis. 

15. An isolated*polynucleotide comprising* nucleotide sequence.eneoding.the Ompl2 
polypeptide^ Bondetella pertussis:- 

16. An isolated polynucleotide comprising a nucleotide sequence encoding the Ompl3 
polypeptide of Bordetelia pertussis. 

17. An isolated polynucleotide comprising a nucleotide sequence encoding the Ompl4 
polypeptide of Bordetelia pertussis. 

18. An isolated, polynucleotide comprising a nucleotide sequence encoding the Ompl5 
polypeptide of Bordetelia pertussis. 

19. An isolated polypeptide encoded by the polynucleotide of claims-a-18. 

20. A vaccine comprising the polypeptide of claim 19. 

21. A kit for diagnosing infection with B. pertussis bacteria in a human comprising a 
polynucleotide of claims 3-18 or a polypeptide of claim 19. 
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1 GATCTCCAGC TTGATGTCCG GATGGGCCTT CTCGAAGGCC GCCTTGTAGG 

51 CCTGGGTCAG GTCCTTGGGG AAAGACGTGA TGACCGTTAC CGTGCCGGCC 

101 AGGGCCGGCG CGCAGGCAGC CATGATGGCC GCTGCGAGGA CTGGGCGCAA 

151 CGCTTGCATG GGTCTCCTCC TTTCTTGAGT TGTGGCAAgA CCTTAATGCC 

2 01 CGTGTTCCCC CGAGTCCAAT CAGAAATTTT GATGCGCGCC ATCaACgGCG 

251 CGGCcGGCTT CAGCGCGCCG CCAGCCGCTG CGCCAGGCGC AATTGCTCTT 

301 CCAGCAGCTT CTGGATCTGC CCCTGGATGG CGAAGGCCTG GCGGTTCAGG 

351 TCCTGCAGCT GCCGGGCCTG CTGCTCGGCG GTGGCGTTGC TTTCGCGCAC 

4 01 CTGGCGCATC TGGCGCATGA TGGCCGCCAG CTGGCGCTGC AGGGCCTGCA 

4 51 AGGCCTCGGC CGCGGGGCTG CCGGCCTCGT CCTTTGCGCC GCTGGCGGCG 

501 GCCCGGTCGC GCAACTGCCT GGCGTACTGG TCGAACGCGC CGGCGGAGAT 

551 GCGCGATGAA ACGGAACTGA CGGTCATGGT GGCGGCGTCC GCGTGGATTA 

601 GGGTTAGTCC TTAATACGGC AGGCGCGCGG ACGAACTTAA ACATATTGCT 

651 GCGCGCTTCC GACCCCCCTT ACACTTGCGC CGCCCATCCC TTCAACTTCC 

701 ACGCATACCT ATGTCCAATA CCTATTTCCC GCGCTGGCGG CTGGCCGACG 

7 51 ACACCGTGCC GGGCGCGGTC ATCGCGCCCG ACGAACGCCT GTCCTGGCCC 

801 AAGAACATCG CCATGGGGGC CCAGCACGTG GTCGCCATGT TCGGTTCCAC 

851 CGTGCTGGCG CCGCTGCTGA TGGGTTTCGA CCCCAATGTG GCGATCCTCA 

901 TGTCCGGCAT CGGCACGCTG ATCTTCTTCC TGTTCGTCGG CGGCCGGGTG 

951 CCCAGCTACC TGGGCTCCAG CTTCGCCTTC ATCGGCGGGG TGGTGGCGGT 

1001 CACCGGCTAT GTGGCGCCCG GCGCCAACGC CAATATCGGC GTGGCGCTCG 

1051 GCGCGATCAT CGCCTGTGGC CTGGTGTACG CGCTGATCGG CCTGGTCGTA 

1101 TGGGCGGCCA GCGCGCGCGG CAACGGGGCG CGCTGGATCG AG GC CAT GAT 

1151 GCCGCCGGTC GTCACGGGCG CGGTGGTGGC GGTGATCGGC CTGAACCTGG 

1201 CCCCGATCGC CGCCAAGGGC GCCATGGGTT CGTCCGGCTT CGAGGCCAGC 

1251 ATGGCGTTGA TGACCATCCT GTGCGTGGGC GGCATCGCCG TCTACACGCG 

1301 CGGCATGGTG CAGCGGCTGC TGATCCTGGT CGGCCTGGTG CTGGCCTGCG 

1351 TCATCTACGC GGTCTGCGCC AACGGCCTGG GGCTGGGCGC GCCCATGGAC 

14 01 TTCGCCAAgG TGGCCGCCGC GCCGTGGTTC GGCCTGCCCA GCTTCGCCGC 

14 51 GCCGGTGTTC GAgCCGCAGG CCATGGGCCT GATCGTGCCG GTGGCCATCA 
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CCTGCTCCTC 


CGTGCGGCTG 


GACAGGTTGG 
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2851 


GCCGCGCCGA 


CGTTGATTTC 


GTCGACGCCG 


CGCCGCATGA 


CGGCGATGGT 


2901 


GCGCGTCAGG 


CCTTCCTGCA 


TGCGCTTGAG 


CGCCGCGAAC 


AGCGCGCCGA 


2951 


TTTCATTGGC 


CGAGCGCACC 


TCGATGCGCG 


CGGTGAGGTC 


GCCGTCGGCG 


3001 


ATGCGGTCGA 


AATGATGGCC 


GGCCTCCAGC 


AAGGGGCGCA 
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3051 


GCGCACGAAC 


AGCCAGCCGG 


3101 


GCGCGATGGC 


GCTCCAGCGG 


3151 


CGCACTTCGT 


CGCTGTGCGC 
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GCGCTGGAAT 


GCATGTTCGG 
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CGGGCTCgGC 


CTGgCCGGCG 
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GACTGGTAGG 
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TTGGGCACGT 


3401 


GGCGCTGCGC 
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CCACGTCAGG 
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3551 


CCGCCACCAT 


CGCGGTGCGG 
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TGgGTAGGGT 


CTTGCGCAAT 


3651 


GCGGGAAGTG 


TCAGGCCGTG 


3701 


ACTTAAATGA 


TCGGTATTCG 


3751 


CGAATATTCA 


TACCCGGATT 


3801 


cTTTcGATTG 


CCCGGgGCcT 


3851 


GGGCCCGCGC 


TGCGATCGGC 


3901 


CAGCCAACAT 


GCGCCGAAAC 


3951 


GTGGACTTTG 


CCGACAATGT 


4001 


GGGGTTTTGT 


TTTGGTCGAC 


4051 


AAG AG GAG AC 


AGCAAAGGGG 


4101 


TTTTGTTCGC 


CTGTCTTCCG 


4151 


TACACCAAGC 


CGAGACCTTC 


4201 


CTTTTGTTTG 


GGAAACGACA 


4251 


CCGAGCGCTT 


GCAGCAAGAA 
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GCGGTGATCA 


GCGCCATTGC 


4351 


AAATGCCGAG 


TACGACGCCG 
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GGATCTCCGA 


ACTCGAGGGC 
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ACGGCGCTCG 


ACGCCGAAGG 
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CGAAGACCTC 


GACTCGGGCG 



CCAGGGTCAG CAGCACGCCC AGCGTGGTCA 
GCCACGACAT GGGTGTCCTC gGnGCCGCTG 
CTGTACCCTG GCCAGAAAGG CTTCCATGTC 
CCTGCTGCGC GCGATGCATG GCGGCCAGGG 
TCGACGGCGG CGGCCAGCTC GTCCAGCACC 
CGTCTGCAGG GTCGCGGCCA GCTCGGCCGC 
CCACGTAgCG CTGGAATTGC gTGGCCGCCT 
GCCTCGAACA GCGGGTCGAC CTGgTTGACG 
GATCTCGGCG GCGCTCCTGC CCGCGTTGCG 
GCATCAGCAC CGCCAGGAAg CACGCGAATA 
ACCTTGATCC TTTCAAgCAT GGGCATCTCC 
AAACGTCGAC GACGTGCGCG GCGTACCCGC 
ACGTGGAATC CGTTGTTGAT GTGGAATTCG 
GTATTAATTT GTTATTTGTA GTTATATATA 
TGCCCTAAGT TGGTGCGTTC TCGACGGGTG 
GCGGCGCATG AAGGAATGcG CGGCAACgCc 
GGCCAACACG CAGGTTTTGT GGCTTTTCCG 
CTACGCCGGG CTTACAGGCT TGCAATTCCG 
CATCTGATTG CCCCGGTTCC GACCCGAGCC 
GCTTGGCCGC CGGATGCGGC AGGCCGATCA 
AGCCTCGGTC GGGTTCGACC TTGTCTCGTC 
CAGCGGCCCG CCTGTTTCAT GGCGAGGCCA 
ACCGAAACGC TCCGTCGGGG GCGTTTTCTA 
TGTCTGCCAT TCCTTTGACC GTGCGCGGGG 
CTGCATCGGC TTAAGACCGT TGAGCGTCCT 
GGAGGCGCGT GCGCAGGGTG ATTTGTCGGA 
CCCGCGAACG CCAGGGCTTC ATCGAAGGCC 
ACGCTTTCGA ACGCGCACCT CATCGATCCA 
CCGTGCCGTG TTCGGCGCGA CCGTGGAAAT 
ACCGCCTGAC CTACCAGATC GTGGGCGACG 
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5101 


AGCCAGGATT 


TCATGGCATC 


5151 


AGCGTAAGCC 


CACTCACGCA 


5201 


CATTGGTCTG 


TGGGCGGTAA 


5251 


TCATGGCACA 


gCGCGGCGAA 


5301 


GGTGAGCAAG 


CGCTGGATGG 


5351 


CGTCCTTGAG 


GAACTGGACG 


5401 


TCGGTGAAGG 


CCACGCGGGC 


5451 


CCAGCCGGCC 


CCCTCAACGG 


5501 


CAGGGCGCTG 


GATACGTCCC 


5551 


CCGGGGGCCT 


GATGCTCGTA 


5601 


GGCCAGGTGC 


GACAGACCGG 


5651 


CTGACACGCC 


CAGCGCCTGG 


5701 


CGCAGCTCCA 


CGATAGCCAG 


5751 


GACCGTCGGG 


CGCGAGGACG 


5801 


GGAAGCGGCC 


CAGCCATTTG 


5851 


CGGGCCGCTT 


CAGGCACACA 


5901 


TTCGAGTCGA 


CGTAGGAAGG 


5951 


GGCCGGGCTC 


CTTGAGTGAA 


6001 


ATCCGGTTCG 


GATGAACCAT 


6051 


CGCGCGTGGC 


GCGGAAAGAC 



AACCTGATTT CGGTCTCCAG CCCGGTGGCC 
CGAGGGCGAT GTGGTCGAAG TGAAGGTGCC 
AAGTCATCGG TGTGCGTTAT CTCTGACGCC 
CCATGGCCAA TGACCGACGC CGTTTCCACC 
CAGGCGGGGT GGCACAGCCG CAGCCGTCGC 
GATATCCCGG GCTCCGGTCG TGGCCAGCGT 
ACGGCTTGGG GGGCGCGGTA TTCATCCGAA 
CGCCGGGCGT CTGTATCGTT GTCGCGCCGC 
GGGCGCCGCA ACGCCGGGCC GCGCCGCCTA 
TGTATTCGTC CAGGTTGAGT CTGGAGATGG 
TGGTGGGGTC GATGCCAGTT GTAGTGGTGT 
GGCTCGGTGT TGGGAGTTCT GGTAgGTGTG 
AGGCCGACTG GATGAAGCGT TCGGCCTTGC 
GGTCGGGTAA AGCGGTGCTT GATGCCCAGC 
nGCGCgGCTG CgAAAgGCCG AGCCATTGTC 
TCACGCCCAG GCGCTGGTAG TAGGCCACTG 
GCGCTGGGGA AGCGCTCGTC GGGGTGGATG 
GTGGTCATCG ATGGCCACGA AGACGAAGTC 
TATCGCGTCG GTTGCCCGTG ACCCGGTGGC 
AGCTTCTTGA TGTCGATGTG CAGCAGATCG 
GCGCACCACC GGCTCGGCCG GCTCCAGGTC 
CGCGGGCCAG GACGCGGCTG ACGGTGCTGG 
GCGATGCGCG CTTGGGTCAG CCGCTTGCGG 
CGCCTTGGCC GGCGCAATCG CTCGGGGCGA 
CATCGGCCAA GCCCGCCTGG CCCTGAGCCA 
CGCACAGTCG GCGCGGTGAC CCCATAGGCG 
AACTTGATGG GCGATCAATT GCTGGACCAT 
TCAATCGGGC ATGCTTATGG GTGTTCATCC 
CTGGGGGGgT GGCGATTTCC AGTTTCTCAA 
GCATACAACC TATTGAATCT TCACAACTAG 
CAGCAGGTCG GCCGTCACCG GTTCCCTGTT 
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6101 


GTCGAATAAC 


AGGTAATGCG 


TGGCGTCCGG 


ATATTCGCCA 


TGGCGCGCGC 


6151 


CGGATCCGCA 


TGCCCTTGGC 


GGGGTCCTGC 


GGCCGGACGT 


CCGAAAACCG 


6201 


GGATATTCCG 


AGAAGCAATC 


GGCTGGCGCC 


TTCCAGACTG 


TGCGCATACC 


6251 


ATTGCCCTCT 


TTTGCCACGC 


ATTTCGAGCG 


TATGGTTTCC 


CTTGCGCCCG 


6301 


ACGCAGGCTC 


GCCTCGCCAT 


GACCGACACG 


GCATACCACC 


AACTCATCGC 


6351 


CGATTTCGGC 


CGCCTCATCG 


GCATCGACTC 


GCTCAACCCC 


GGTGCCGGCG 


6401 


GCCTGTGTCA 


GTTGATTTTC 


GAACCGTGCG 


CACCGGTCTT 


CATCGCACCG 


6451 


GTGCACGCCC 


GGACGGAAAT 


CATGATTTCC 


TGCGTGCTGG 


GCACGGCGGA 


6501 


CGCGGCCAAC 


CCGGCAAGCA 


TGGCCCGAGC 


CAACTTCATG 


CAGGCCGGCA 


6551 


GCGGCGTCGT 


GGCCTGCATC 


GGCGGCGATG 


GGTTGTTCTA 


TCTGCAGCAG 


6601 


GCCATACCCC 


TGTCGCGCGC 


CACGCCCGCA 


ATCCTGCTCG 


ATCACTGTGA 


6651 


GCGTCTGCTG 


CAGGAAGCCT 


CGCGCTGGCG 


CGTCGGCGAC 


CACGACGGCT 


6701 


GCGCCACCTC 


GGCCCCGAAT 


ATCGCCGCGC 


TGACGCGCGG 


CGTCTAGGTC 


6751 


GGCGGCGACT 


GCCGGGCGCC 


GCCGCGGCAG 


GCTCAACTCG 


CcTTCTGTAT 


6801 


GACGCCCTTG 


AGCGAATCCG 


CGACCTGCTT 


GACCACCGTG 


CTCTGTAGAT 


6851 


C GAT CATC AC 


GACCCACGAT 


TGCATTTCCT 


GTTGCACGAT 


CAGCAGGTCG 


6901 


GACGTGCTGA 


CCGCGCCGTC 


TCCGCGCGCG 


CTGAGCGCCT 


CCAGGCGGCT 


6951 


GCGCAGGTCG 


CGTTCGTGAG 


CGTTCAGCCG 


CGTATTGACC 


GCCTGGTTGA 


7001 


CGCTCTGCAT 


GGTCACTCGG 


CCTGCGTCGC 


CTCCCAGGTT 


AATGGCCATG 


7051 


CTTGTCTCCT 


TCGGCGCATT 


GTTCATTGCT 


CAGGCGCGTC 


AAGACTGACG 


7101 


CCGGAGGGTT 


GTCCGGCCCG 


GTCGGGCGCT 


GCAGCAATAC 


CTGCCGGGCC 


7151 


GCGGTGGCGG 


CCGCCAGCGC 


GTCGCGCCAT 


ACGGGATAGG 


TGTCGCGCCC 


7201 


CACGCCATCG 


TGCAGGCGGA 


CGCGCAAACG 


TGTCTCCAGT 


TCGCCAAGCT 


7251 


GCGACAGCAA 


GGTGTCGCGC 


AAGGCGGAAC 


CGCCCGGCGA 


TGCCAGGCGC 


7301 


ACTTCCAGTT 


CGGTCAGGGC 


TAGAACAGAT 


G TACT CAT AT 


GATGTTGCAG 


7351 


ACGAGGGTTG 


ACGGCTACCG 


AGGCGATTTC 


ATCGCGTCAC 


GATGACCGGT 


7401 


TCGGGACCAT 


CGAAGACCAG 


GCGGCCGGAT 


TCGATGGCGG 


TAAGGCGATA 


7451 


CTGGTCCCGC 


AGTCCGCCCA 


CCAGGAGGCG 


GCTGCCATCG 


GCCAGCATCA 


7501 


GGTACGGTTG 


CGGGCCGCTC 


ACGACACTGC 


GTATCTCGAA 


CGGCACGTGA 


7551 


TCCCGCGTCG 


CGCGCGCGGC 


GGTGGCCGGC 


AGCCGGACGA 


CGTCGTAGTT 
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7601 


GCGCTGGTTG 


AACGCGGCCA 


7651 


CCGCCAATCC 


GCCGGGATCT 


7701 


TTGACGCCGT 


CGAGGCGTTC 


7751 


GACCTCGTCG 


GCCAGGCGTA 


7801 


GGCGCATGCG 


CACCGCATGC 


7851 


GCGATGCCCG 


AGATCGCCAG 


7901 


GCGCACCCCG 


AATGTCGCCA 


7951 


CCTGCCTGCT 


TACCTGCATG 


8001 


CTGGCGACCC 


GAGCGAATTC 


8051 


GAGCACGCCG 


CCACGGCCGT 


8101 


GGCTGTCGAT 


GAGCGCCGCG 


8151 


GGCGGCGCGG 


CCGGCGGCGT 


8201 


GGCCAGGCCG 


ACCAGCAGGA 


8251 


GCCGTCCTGC 


ACGTCGGCGC 


8301 


GTCGAGCCAG 


GGACGTCGTG 


8351 


CGGCTCCGGC 


GGCGCGGGCC 


8401 


AGGCGGCTCC 


CAGCTCTACG 


8451 


GCCTGGGCGT 


CCAGGCCGGG 


8501 


CTGGTCGATC 


TCCAGCCATC 


8551 


GGACGATATC 


GCAATGCGGA 


8601 


GGGCAGCGCG 


CCATGCACTG 


8651 


CGTCGTCATA 


GATCCACCCT 


8701 


CAGTTCCTGG 


TAGGACAGCA 


8751 


TCTTGCGCGT 


GTAGCGCCGG 


8801 


CTCGCGCCGG 


CGGCCAGATC 


8851 


TCGGCGTGTC 


GTGTCCGGAT 


8901 


GCCGGATGGC 


GGCGCGCACG 


8951 


GCGGGCAGGA 


TATTGTGGCC 


9001 


GAGTGCGATT 


CGGACATACT 


9051 


GGCCCCATTC 


GACCAGCGCT 


9101 


ACTTCTTCGG 


AAACAAGGCG 



CCAGCTCGCG CAGGCGCGCC ATGCGGCCTG 
GCGTCCAGGC GGTCGGCGTG CCAGCTGAGC 
GTCGGCCAGC TGGGCCGCGA ACTGGGCCGA 
CATCGCGACC GAGGATCGTC ATGCCCGGCA 
AGCGCCGCGG CGCGTTCGTG CGCATCGCTG 
GCGGCCATTG CCGTACGGGC GCGCCATGTA 
GGACATCGCA GGCCAGGGCC CTGGCCTCGT 
GCAGGCCGTG GCGCAAGTTG CGCCAACGCC 
CGTCTCGTCG TGCACCCATC CGGTCACGGT 
AGGCCGCTTG TAATTGCTCG GTAAGGCCCA 
GCGCGGACCA GCGGCGCGGT GGGCGTTGGG 
GGCGGGTGTG GTCACGGAAA CCAGCGCCGT 
CGGCCGCGGC CGCGCCCAGC GCCAGCCAGG 
GGCATGAGGG CAGCGACGGA CGGCGGGCTT 
CAAGGCTGTG TCGCTGCCGT CCGGGCCGCA 
ACGGCGCGGA AGGGGCGGCC ACGGTGATCC 
GGTTCGTTGA AGGCCGCGGG CGGACACGGC 
CGTCACGGCG CCGGCCAACC GCCAGCCGGA 
CCGCCACTTC GGGCATGTCC TCGCCGGTCA 
TTGGCG.CCCA CGCGCGCGCC ATGCACGGCC 
TGCGCCTGAA AGCACGCGGA ATTCCAGCGC 
GCCCAGGGGC TGTACATTGA TCTCCGGCGT 
CCGGCAGGGC GTAGAGATCG GCTTCTATCA 
ATGTCCATCG ACGTCAGCAA GACGGGACGG 
G C C G AC AC AT TGACGGATGT GCTCGACCAG 
CGAGGGCGAG ATAACTGCCG GCGGCGGTCT 
GTTTCCTCGA CCTTGGGGGC CAGCAGGTAG 
GCTGGTGTAC TTGTGGCTGA TATAGCGCTT 
CCGTAAGCAG GACGGTATCC TTTTCCTTCT 
TCCAGGACGG CGCGCAGGTT GCGTATCGAC 
CTGCAGGATT TCGGCAATCT TCTGCACCGG 
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9151 


CATGACGCGC 


AGGCACTCCT 


9201 


CCGAAAGCAG 


AAACCGGGTT 


9251 


TTTTTCAATA 


CATATGCCAA 


9301 


GTAAGGAATA 


CCTGCATCGC 


9351 


AGATCGTGGG 


CGTATCGGGC 


9401 


CGCAGGGCCT 


GCAGGTTCTG 


9451 


GCGCAACATT 


CCTTGCGCCA 


9501 


TATTGGCGGC 


CAGCGCTTCG 


9551 


ACGCCCAGGT 


CGAAATAGAG 


9601 


GAGGGTGGCC 


GGCTCGAACC 


9651 


TCAGCGGGAC 


GGTGGGGGCG 


9701 


GTGCGGGGCT 


GGCCGTCGGC 


9751 


TTCGGGCGGA 


CGCTGGGATG 


9801 


CCGCGGCGGC 


CAGGGCGAAG 


9851 


CCCAGGCCTG 


CCGAGATCGC 


9901 


CAGCACTTGT 


GCGCCGATGT 


9951 


TCTGCACCCG 


CGTCACGATG 


10001 


GGGATCTGCG 


CGATGAGCCC 


10051 


CACGGCCTCG 


CCGGCGCTCA 


10101 


CGCCAAGCAG 


GTTGACGGCA 


10151 


TTGACGAACT 


TCATCGCGCC 


10201 


CTCGACCGTA 


CGGCGTCGGC 


10251 


CGCGCAAGTC 


CGCGTCGATG 


10301 


GAGAAGCGCG 


CGGCGACTTC 


10351 


CACGAACTGC 


ACGATCGTGA 


10401 


GGTTGCCGCC 


CACCACGAAG 


10451 


TCGCCTTGCA 


GCAGGATCAG 


10501 


GAACAGCGTG 


GTGACCAGCA 


10551 


GCGAAGGCAg 


GTACATCGCG 


10601 


TTGGCACCGA 


TCAGCACGTC 



TGACCAGATC 


GGGAAATCGT 


TCTTCCATGG 


TCCTGGATGC 


CGATGAAATC 


GGCTGAATAT 


GTGCCAAGTC 


AGGATCTGGC 


T GAT ACCCAG 


GCAAGGCGCC 


GGTCAGACTG 


GCCGCAACCC 


AGAAAGGCCG 


CGCCCGTTTC 


GTATGCGATC 


CTCGGTGTCC 


CGCACCAGCA 


CGGCATCGTC 


CCGGGATCTC 


CGACAGCACG 


ATGGTGTAGG 


GTGAAGCGCA 


ACTGGATGCC 


GGGAAACGGC 


CGCCCGCCGG 


ATCTGCAGCA 


GATCGTCGGT 


GGGGCTGCAG 


CCGCGCGGCT 


ACGTCGATGA 


AATTCCGCCT 


GCCCATCCGC 


CGGCGCGCGG 


AGCCATGCCG 


GCGAGCGCGG 


GCTCGGCGCC 


CGCGCAGCAG 


TACGAAACCG 


ATGGTGCCCA 


AAGACCAGCG 


TGGGCATGCC 


GGGAATGAGG 


GCCGGCAATG 


ACCAGGGCGC 


GAGGCTGCGC 


CGGTGCCTAC 


GTTGGAGGGG 


CCATCCCCGG 


ATTCCGGCGC 


AGATGGCGAT 


GAACAGCGCC 


GTCGCCTATG 


GTCAGGATGG 


CATATGTCTG 


GGCCGCGCTG 


CAGCACGCCG 


ACCAGCATGC 


ACGATGATCA 


GGCCGGCGAT 


GGCATCGCCC 


GTCCATGGCG 


CCATACAGTT 


GGCTTTCCTT 


GTCGGGCTTC 


GTCCATGTCT 


ATGGTGCCCG 


GACATCTGCT 


TGCCGGGCAT 


GGCGTCCAgC 


GGCCACCCGC 


TCCGCGCCTT 


TGGTGATGAC 


GGATGAGGAA 


AACCACCAgG 


CCGACGATCA 


TTGCCGAAGG 


TCTCGATGAT 


GTGGCCGGCA 


CCGCGTGGTC 


GCGATGGAGA 


TGCCCAGCCG 


GGACCGAAGG 


GAACGAGGAA 


AACGCCAGGG 


ACCATCAgCA 


GGACTGCCGA 


CAGCGTCATG 


GACCAGCGTT 


GTGGGCAACG 


GCAGGATCAT 
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10651 CATGAAGACG ATCGCCACGA 

10701 TGGTGGCCAG CGCCACCGCG 

107 51 GTCATGGCGG GGTCGCGCTC 

10801 TCACGCATGG CCTGGCGGGC 

10851 CTGCGCCCGG ACCAGATGGC 

10901 GCTTGTCCAG CGTGACCAAG 

10951 CCCAGCGCCA GGGCCAGCAG 

11001 CAGGGCCGCC AGCAGGGCAA 

11051 AGTGGTGCGC AAGCAATTGC 

11101 GGTAGGGTAT GCGGCATCAT 

11151 GCCAGGTCGC GCAGCTTTCC 

11201 TAAGGCTGGC GCGGTGTTGC 

11251 CGTCGCGCGC CTGGGCCAGG 

11301 GCGCCATCCG TCAGGCGAGG 

11351 GTCCGGGCGT ACGAGCAGGG 

114 01 GCGGTTCGAG CCAGCGCTCG 

11451 GGACCGCGCA CGATGTGGTC 

11501 ATGCATGCCG GGTACGGCGC 

11551 AGCCAGTTCG TCAGGCCCAG 

11601 CGAGCGTTCG GGCAGTCGTG 

11651 CCGACCAGAG TGACGTCTGG 

117 01 GCCCGTTTCC ATGCCGCCAG 

117 51 CTGGGCCAGG TGGACCAGGG 

11801 GGCGCCCGTT GGACAGCGCC 

11851 AGGCCCTCGA TGCCGATATC 

11901 AGTATTCATG CGTTCTCCAT 

11951 GGCCGCCAGG ACGGTGGCGC 

12001 GCAGGTCTTT GAGAATCTGG 

12051 GAGAGGGCAT TCGCACCGTG 

12101 CGCAATCCAT TTGTCCTCGC 

12151 GGGCGTCCGC ACACTCCTGC 



TGAGCACGGC CAGTACGATG TCGTTGCGGC 
CGTTGCAGGC GGCGAATGGA TTTCTTGCTC 
GCGCAAGACG CCGCCCGTAG CGCCATATAG 
GTCGTCGGGT CGGTTCAGCG CCTGCATGGC 
CGGCGGCATC GGGTGTGGCG CGCAGTGCGC 
GCCATGCGCG GTTCGCCCTG GTGCAGATAG 
GGACTGGCTG TCGATCGCGT CCAGGGCATC 
CCGTCTTGCT CCATTGGCGC TGCAACTGGT 
AGAAGTTCGC GTACCTGAAG GCTGGGCGAG 
CCCTGGTAGA GCGCGCTGCG ATACATGGCA 
GGCCTCGTTG AGTACGTGCA AGGCGCGGCT 
CGGCATGCAG TTCCATGGCC CGGGACAGTT 
GCGCGCTCGA ACTGCGCGGG CTGGAATAAT 
CCGCGCCGAC TCGTCGAGCA TCGCGCTCAG 
CTTTCAGATG ATCGACCGCG CCGGTGGCGG 
GGTGGCAGGG TGGGGGCGGG CTCGCAGCGG 
GACGCCGCGC TCCAGGCCGA GGTGCATGGC 
TGGACATGGC GTCACGCCTC CAGGCGGTCG 
CACCGCCAGG CGCAACGCCG CTGCGTCGAG 
TCTGCGCGAC GATCCAGTCC TCGCTGCCCT 
ATGGATGCGG CGCTTCCGCG CTGCCCATGG 
CAACACGGAG GCGGCGTCGC GCTCGACCCG 
CCGCGCCGGC GACGCATTCG ACGCCCAGGC 
AGCGACGCCG ATCCGGACGG CCCGAATGCC 
CTGGCCGAAC TGATGCAGCG CCCTATCGGC 
TGCTATCGCG TTGTCCAGCG CGTCCTGCGC 
GCACGTCCAT GTCGGCGTAG ATCTGCGTGG 
CGTACGCCAC CGAGGAATGC GATGCGCTCG 
GCGCTCGGCC AGCTTCTCGA AGCGCGCGGG 
TGATTCCCAC AAGATCGCGC ATCAGGCCCT 
GAGCCTGCGT TGCCCAACCG TTGTTTCAGG 
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12201 GCATTGCATT CCTCCAGTAC 

12251 GCTCGCCAAC ACTTGCAGCC 

12301 CCAGGTCGTG TCCCAGCGCC 

12351 TCGTCGTTCC CATAGCGTTC 

12 4 01 CAGCTGGCCC AGGGCGATGT 

12 4 51 CGTCAGCGGA ACGCGCGAAT 

12501 GCGCGGATTT CGGGGCCATG 

12551 CAGGGCTTCG AGCGCGTGCG 

12 601 GCGCGTGCTG CAGCGCGAGG 

12651 GCGGCCAGCT GCATGGGGGC 

12701 GGCTTCCAGT TTTGCCTGCG 

12751 CCGCAAGCTG CGCCGCGTCC 

12801 TTGCGTTCGG AGTGATGCTT 

12851 GAGCTCCTCG GCCGCGTCCG 

12 901 TGCGCTGGCC TTGCAGCCAG 

12951 CGCCCCTGCA TGGCGGCGTG 

13001 CATGGGGAGT CCTCGGAGAA 

13051 TCGCGCGCGC GGTCATGGTT 

13101 GCAGCCTGGA ACTTGCCGCG 

13151 ACTCGAAGGA GCTCTCATGA 

13201 AGGCCGGCGG CCTGCAAGGC 

13251 CTCATGGTGT ATGTGCAGGG 

13301 GCAGACCCAG GCCGAAGTGG 

13351 TCAACGAGGT CCTGTCCGCG 

13401 AATCCGAAGC CGGGCGACAC 

134 51 CCGGATCGAG GTTCCTCTCA 

13501 GCATGTTCGA AGCGCGCGAT 

13551 ACGCAGGTCG TGAACGGCAC 

13601 GGAACTCGAA AGTGCCTACA 

13651 CCAATACGCA ACAGATGGAC 



CGTGGCGGCC ACCTCGACTT G AT AG AG AT C 
TGACGCCGTC CGTCGACGGT GTCGCCGCGG 
TGAATCAGCG CGCCCAGCGC GCCGTGGATG 
CAGCACCAGG TCCAGCGTGC GCGCCAACGA 
CGCGGTACGC GTGCTGGAAG CCGGCCAGCT 
GCGCCGGCCG TGGGCAGGGT GTTGATGCCG 
GGCGAGCTCC AGGTCGGCCA ATGCATCGCG 
GCGCGGCGTC CTCGTGCTCG CCGCGCTGCA 
TATTGCTGCG TGACACCGGG AAACGCTTGC 
GCCCCGGCCG CGCAGCAGCT CGGCGGTCAG 
CGTCGGGGTC GTGGGTGTGG GAAAACAGTT 
AGCCAGAGCA TCGGACGTTC GGCCGTGACC 
TTCCTCGGCA GCCTGCGCCA TGTGCAGGCT 
CCAGCGATAT GCCGGTGGGC GCCGGTGCGA 
CCGGAGGAGG TGTTGGCCGA GGCGTCGTGG 
GAAGGGATTG GGGGCGGCAT CGATACGAGT 
GGAACCATTT GCCTACTGGT GCAGTGAGTG 
CCCGGAAACG GGCGCGATAT TGGGCAATTC 
GGCGCAGGGT TACTCAGCAT GCGTCTTTCA 
GCATTGATCT CGGAGTTTCA CTCACGTCGC 
ATCGACCTCA AGAGCATGGA TATCCAGACT 
TCGTCGCGCC GAACTCCTCA CGGCTCAAAT 
TGCAGAAGGC CAATGAACGC ATGGCGCAGC 
CTGTCCCGGG CCAAGGCCGA GTTTCCGCCC 
CATCCCGGGC TGGGACAGCC AGAAGATCAG 
ATGATGCGCT GCGTGCCGCC GGCCTGACGG 
GGCCGGGTGA CCGGCCCCGA CGGCCGGGGT 
GGGCGTCATG GCCGGTTCCA CGACCTATAA 
CCACCGTAAA GGGGATGCTG GATACGGCGT 
ATGATCAGGC TGCAGGCCGC CAGCAACAAG 
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13701 CGCAACGAGG CTTTCGAGGT CATGACCAAC ACCGAGAAGC GGCGCAGCGA 

13751 CTTGAACAGC TCCATCACCA GCAACATGCG CTAAGCGCTG CACAAGGAGT 

13801 ATTCCATGCA GGAGCAAGGC ATCCAATCCA TCATGCGCGC CGCGGAAGAG 

13851 CTGGTCGAGC AGACCCGCCA GGCGTTGTAC AGCGTCGACG AGATCTACGC 

13901 CCACGTTGGC GTCGACCCCG CTCGCCTGCG CAATCTGGCG GTCGAGCAGG 

13951 CCAGGATAGA GGCCGAGGCC CAGGCGGCGT TCCGTGATGA CCTCGCGGAC 

14 001 ATCGAGCGCG AGGCGGCGCG CGTCAAGGCG GCCTGCACCG ATGCGCCGCA 

14 051 GGCCCGCAGG GTGCTTCACA ACCACGTCTG AGCGCGGAGG CCTTCCATGC 

14101 CAAAGTCAGC CGACCAGGGC GGCTCCCCGG CGTCAGCTTC GCATGAGGCG 

14151 TTGCGCCATA TTCTCGACGC AGGCGCTTCG ATGGGGGGCT TGCAGGGGTT 

14 201 GGACGAGGCG CAGCAGCAGG CGTTGTACGC GATCGGTCAT GGCGCCTACG 

14 251 AACAGGGGCG CTATGCCGAC GCGTTGAAAA TGTTCTGCCT GCTGGTCGCG 

14 301 TGCGATCCGC TGGAAGCCCG TTATCTGCTG GCCCTGGGCG CCGCGGCCCA 

14 351 GGAGCTGGGG CTGTACGAGC ATGCCTTGCA GCAATACGCG GCCGCGGCGG 

14 401 CTTTGCAGTT GGACTCCCCC AGGCCCCTGT TGCATGGCGC CGAGTGCCTG 

14 451 TATGCGTTGG GTCGTCGCCG CGACGCCCTG GATACGCTCG ACATGGTGCT 

14 501 TGAGTTGTGC GGCTCGCCGG AGCGTGCGGC CCTGCGCGAA CGGGCCGAGT 

14 551 TGCTGCGCAG GAGCTATGCA CGTGCCGACT GAAACGGCGC CATGTCCGCC 

14 601 GTCAAGATTT CAATTCGAGG AGGTTCGATA TGTCTGTTTC TCCGACTTCG 

14 651 CCCGGCTCTT TCGGGGCCGG CCCTGTCTTT GACTCCGAAT TGCAGGCCCC 

14 701 GGCCCCGTCG GCGCAGCGTC GCGGCGGTGC GGCGCCTGTG CCGCCGCCCG 

14 751 TCGATCGGCG CGGCGTCGAG CCGGGAGATC CCACGCTGGG CATGCTGCCC 

14 801 GCGCCAGATT TGCTCGCGGG GGGCGCCGTC AGCCGCACCC GCGCGGCGCT 

14 851 CGACGATCTG GACGCCGCAC GGCTCGGTGA AGACATCTAC GCCTTGATGG 

14 901 CGGTGTTGCA ACAGGCCAGT CAGCAGATGC GGGACGCCGC CCGTATCGCT 

14 951 CGTGATGCCG AGGCTACGCG GCAAACGCAG GCTCTCGGCG ATGCGGCCAG 

15001 CCAGATGCGC CAGGCGGCGA GCGAGCGCAT GGCCGGAGCG ATCGTGGCGG 

15051 GCGCCATGCA GATAGCGGGT GGTTTCGTGC AGCTGGGGGC GGGCCTGGCA 
15101 GCGGGTTTGC AGGCCATGGG TGGCGCTGCT GCGCAAGCCA AGGGCGCCGC 
15151 ATTCTCCGAG CAGGCCTCGA CAAGCCGCAA GGTGGCGGCC GGCTTGCACG 
15201 ATGCCCCCGA GCTGCAGGCA ACGGTGCAGG CCCGCGCAAC CCAGCTCGAA 
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15251 


GCGCAAGCGG 


CCTCGTTTGG 


15301 


GCAGCGCGTA 


TCGAGCGTTG 


15351 


TCGGCGGCCT 


GACCAGCGCC 


15401 


GCCAGGCGCG 


CGGAGCTGGA 


15451 


GCGGCGGGCC 


GACGAAGCCA 


15501 


TCAGGGAAAA 


GCTGGCCGGG 


15551 


AGCGTGGCCC 


GCAATATCTG 


15601 


ATGACCGTCA 


TGAGTACGAC 


15651 


TGCGCCGTCT 


C G CAT AG AT A 


15701 


AAGGCGCCGG 


TATCCTGGCG 


15751 


CGGCCGGCTT 


TGCCAGCGTC 


15801 


TCCGCCAGTG 


CGCGATCTCA 


15851 


TCTTGCGATC 


CAGGGCGGTG 


15901 


CTGCAGGATG 


CGCAAGTCAA 


15951 


CAAGCTGGAC 


GCATGGTTTC 


16001 


GGCTGAGCAA 


GGTGTTCGGC 


16051 


TCGGCCCTGG 


CTGTGGGCTT 


16101 


GGCGGCCACG 


CCCATGCTGG 


16151 


TGACATCGCT 


GGCCGACCAG 


16201 


AGCCTGGGCG 


GGTTTCTCTC 


16251 


GGGGGTGGAT 


CAGTCGCAGG 


16301 


TGGCCGTGCC 


CGCCGTCTTG 


16351 


GCCGAAGGCG 


TGGCCAGGCT 


16401 


CATAGCCATG 


GCGATGTCCA 


16451 


ATGCCGCCGG 


TACGGCCGGC 


16501 


TGGGATCGGG 


CCGCCGCGGT 


16551 


AGTGGCGCAA 


GGCGGCGTCG 


16601 


CCGATCTCCT 


GGTCGCCGAC 


16651 


CTGCGGGCGG 


CCATGGAGCG 


16701 


TCAATTCGAC 


GCGGCCTATC 



TGCGGACGCG. GCTCGTTCGT CGGCAAAGTC 
CCCAGGCCGG CGCCGCAGCG GCCGGCGGTA 
GCCCAGGAAC GCCGCGCCGC CGAGCACGAG 
CGTCGAAGCG AAGGTGCATG AAACGGCCTC 
TGCAGCAGAT GCTCGACATC ATCCGCGGCA 
ATGGAGCAGT CCCGCAGCGA GACCGCCCGT 
AGTGTCCGGC TCCAACCTTC AATCTTGAGG 
CATATCCACA GCCCCGAGCG GCGCCGCGCT 
TGCGGGCGCC GGAGCCCGGG AGTGCCGGCG 
CCGGTGACGA CGCTGGCTCT GGCGGCGGGC 
ACCGTCGCTG CGCACCGCGC CCGTCCTGGA 
GCCCCGCCGA CTTGGCCGAC CTGCTGCGCG 
GACGGGCAGT TGGCCACGGC GCGCGAGAAC 
GGCGAAGCAG AACACCCAGG CCCAGCTCGA 
GGAAGGCTGA GGACGCCGAG AGCAAGGGCT 
TGGATCGGGA AGGTGCTGGC GGTCGTGGCA 
TGCTGCCGTC GCCAGCGTGG TCACCGGCGC 
TGCTCAGCGG CATGGCATTG GTCAGCGCCG 
ATATCGCGAG AGGCGGGAGG GCCGCCTATC 
CGGGCTGGCC GGACGTCTGC TGACAGCGTT 
CCGACCAAAT TGCCAAGATC GTCGCCGGCC 
CTGATCGAAC CCCAGATGCT GGGCGAAATG 
GGCGGGCGCC GGCGATGCCA CCGCGGGATA 
TCGTGGCGGC GATCGCGGTC GCCGCGATCA 
GCGGGCAGCG CCTCGGCGAT CAGGGGTGCC 
AGCCACCCAG GTCCTTCAGG GGGGTACGGC 
GCGTGTCGAT GGCAGTCGAT CGCAAACAGG 
AAGGCGGATC TGGCGGCGAG CCTGACAAAA 
TGAGGCGGAC GATATCAAGA AGATCCTGGC 
ACATGATCGC GCAGATGATC AGCGACATGG 
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TAGCGCCGGG 


CGCTCAAGGA 


16851 


GGGGCGCTGC 


GCGCCCGGCT 


1 

1 U^Ul 


AGCCGAGCAA 


TTGGAAGTGA 


ID JJi 


GGTGCGAGTA 
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CTCGATCTGC 
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ACGAGTTGGT 
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CTGGCGAGAA 


TGTCATGAAC 


18251 


GGCGCGGGAT 


GGGCGGCCCT 



AGCGCCAACC TCGGACGGCG CCAGGCGGTG 
ATTTTCATGA CTGTTCATGA CGACGCGGCC 
GGATGCGTTG CCGGGCAGCC GGCGCCTGAC 
TTTACGCGAT GGCGTATGCG CACGTCGCCA 
CTGCCCATTT TCGCCTTCCT CGCGCAGTAC 
CTGGGCCGGC CTGGCGCTAT GCCTGCAGAA 
CGCGCAATAT CTATGCGTTG ATCCTCACGC 
GCCGTGTTGC GCACGGCCGA GTGCGAGCTG 
GGCGCAGGCG GCCCTGTTCG , GCGCAATCGC 
AGCCAGGTCC GGTCTCGCAC CGTGCGCGCG 
GTTTCACATC CGGAGTAACT CCATGCACTC 
GCTCAGACTC AGGCTCaGGC TCACCCATGG 
GAACCGATAC AGCCGATGGA GCATGTGCTC 
GCTTACCGAA GTGGGTTTTC TGGCGGCGGC 
CGGACGCCAT TTTCAATGCA TTGCAACGTG 
CCCTGCATCG GCCTGGCGGT CGCCCGCATG 
AGCCGCCGAG ATCCTGGCGA ATTTCCAGCC 
CGGAACTGGA CGCCTGGTGC GGGTTCGCTC 
GACGAGGCGC GCCGCATGCT GCAGCGAGCC 
GGCAAGGCTG GCGCAGGTCG TGTTGGACAG 
CCGCGCCGTT GCAGTCCGAG CCATTACCTG 
ATCTGACGGC GATCAACGCC GTGCAGGAAC 
GACATGCCGC GATCTCCCGC GATGGCGGAT 
GCTGGGCGAG ATGCCCGGCG CATCGGCCCC 
CGCCGGTCGC GCTCGACGAG CCGCTGGGCC 
CGCGGCGGCC TGGCCGATGT GGCAGGAAAA 
CTTGGCCGAG GTGAGCCAGG CGCCTACCGT 
AGGCCAGGTT GCTACAGGCA TCCGTGGAGT 
ATAGGGCGCG CCACCCAAAA CGTCGATACG 
GCCATCGGGG CGATCCAACG GTATCGGCGC 
GGTGCTCGCC CTGGCGCTGC TGGCCGGCTG 
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18301 CGGTGCCCGC GTCGAGCTGT TGGGCGCGGC GCCCGAGAAC GAAGCCAACG 
18351 AAGTATTGGC GGCGCTGCTC GAGGCAGGCA TCGCTGCGCA GAAGCAGTCC 
18 4 01 GGCAAGGCCG GCTACGCGGT TTCGGTGCCG GCCGAGGCGG TGGCCCGGTC 
184 51 GCTGGAGATC CTGCGCGCAA GCGGCCTGCC CCGCGAGCAG TTCGACGGAA 
18 501 TGGGACGCAT ATTCCGCAAG GAAGGCCTGG TTTCATCTCC GCTCGAAGAG 
18551 CGCGCCCGCT ACATTTATGC GCTGTCTCAG GAATTGGCCG ACACCCTGTC 
18 601 GCAGATCGAC GGCGTGCTCA GCGCCCGCGT GCACGTGGTG CTTCCCGAAC 
18 651 GCGGCGCGGT CGGCGAGCCG GCCACCCCTT CGACGGCAGG GGTGTTTCTC 
18701 AAGTACCGCG ACGGACAGAG CCTCGACGCG CTCGTGCCCG AGATCCGCAA 
18751 GCTGGTCACG CATGCCATCC CGGGCCTGGC CGAGGACCGT GTATCGGTTG 
18801 CCCTGGTGGT GGCCCAGCCC GTTCAGGCCG CACCCGCGCC GGTCGCGTGG 
18851 CGCCGCGTGC TTGGCGTACA GGTCGCGGAC GGATCGGTCC TGAGATTTTC 
18901 GCTGTTGCTG CTGTTGTTGC CGGTGCTGTG CCTGATAGTG GCGGGGgCCG 
18 951 CGCTCTACGT CTGGCGCACG CGCTGGTCCC GCGGCGAAGG GCGCGGCGGC 
19001 GCTGGCGCCG GCGCCACGGA AGGAGCCGGG CATGACTGAG GCGAGCGTGC 
19051 TGCTTTCCGA GCGGCTCATG ATATTCAATC TCCTGCCCAG CCTGACCCTG 
19101 CATGCCAGTC GCCACGACGA GATGTTTCCA GCCGATTGGG TGCGCGCGTT 
19151 GTGCAATGCC GACGCGGCGT TGGCCAACGC GTGGCATCGC CATTGGTCGC 
19201 GCTGGATCTT GTGCGAATTG GGCCTGCTGA ACCAGCCGGT CCTGAGCCTC 
19251 GATCCGCCGC AGTTGAAGGT CGCGCTATTG TCCACGGACG CCTTGCGGAC 
19301 CTGCGCCGCC CATGCGGGAG CGCTGCTGTG CGCGCCGCGC CTGCGACGCG 
19351 CGATAGACGG CGCCGAGGTC CGTACCTTGC ATGCCGCGCT CGGGCGCGAT 
19401 GTGATGAATT TCGCCGTGTC TTCCGCGGCG CGGGCCCTGC ATGACGGGCT 
19451 CGCCGCCAGT TCGGACTGGA CCCTGGCCGC CACGGTCCAG GCGGCGCAGA 
19501 AACTGGGCTG GGCCCTGCTG CGCGACGCCG TGCAGGGCGC CGCCGACGAG 
19551 ATAGCGCTGC GTTGCGCGCT GAAGTTGCCG CGCGACCTTG ATCCCGCGCC 
19601 CGTCCTGCCG CCCGAGGCGG CGCTTGCGCT GGTGCTGTCC ATGCTCGAAA 
19651 TCCTGGATGC AGAATGGCTT TCCTCGTTCC CCGCCCAAGC CTGATCCAGG 
19701 CGGTACGGCC CGGCCGTGCG GATCCCGCGA CCGACGTCTT GCGCGCTGAA 
19751 GACTACGCCG AGCTGCTCAG CGCCGCGCAG ATCGTTGCCC AGGCACACCG 
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19801 GCGGGCCGGC GAAATCGTGG CCGAGGCGCG AGAGGAGTTC GAGCGCGAGC 
19851 GCAGGCGAGG CTATGAGGAG GGGCGCCGCG AAGCGCTTAC GGATCAGGCG 
19901 G AG AAG AT G A TAGAAACCGT AAGCCGCACG ATCGACTACT TCGCGGGTAT 
19951 CGAGAACGAG ATGATCGAAC TGGTCATGAG TGCGGTCCGC AAGATCGTCG 
20001 ACGGTTACGA CGACCGCGAG CGCACCGTGA TCGCCGTGCG CAACGCATTG 
20051 GCGGTCGTGC GCAATCAGCG CCAGATGACC TTGCGCCTGC ACCCAGACGA 
20101 GGTGGATGTG CTCCGGGAAG GCATGAACCA GCTTCTGGCG GCCTATCCGG 
2 0151 GCGTGGGCTA CCTGGACCTG CTGCCCGACG CCAGGCTGGC GCCGGGAGCC 
20201 TGCATACTGG AGAGCGAGAT AGGCATGGTC GAGGCCAGCC TCGAGGACCA 
20251 GCTGTGCGCC TTGCGGGCGG CCTTCGAACG TACATTCGGC CgGCGCGGAT 
20301 AGGGGCATGC GTCaGTACCa CTACATCACG GAgATGATGC GGGTGGCCCT 
20351 GCAGGATCTG TCCACGCTGC GGATAAAGGG CCGGGTGGTG CAAGTGGTGG 
20401 GAACGATCAT CAAGGCCGtC GTTCCGATGG TCAAGATCGG CGAAGTGTGC 
20451 CTGCTGCGCA ATCCCGGCGA GGACTTCGAG ATGCACGGCG AAGTGGTGGG 
20501 CTTTGTCCGC GACGCCGCCT TGCTCACGCC TATCGGCGAC ATGTACGGGA 
20551 TTTCCTCGGC GACCGAGGTG ATACCGACCG GACGCACGCA TATGGTCCCC 
20601 GTCGGTCCGG GCTTGCTGGG ACGCGTGCTG GACGGGCTGG GACGTCCGCT 
20651 GGACGCCGCC GAGTCAGGGC CGCTGCATGC CCACAAGTTC TATCCGGTCT 
20701 TCGCCGATGC GCCAGACCCG CTGACGCGTC GCATCATCCA TGCTCCGCTG 
20751 GAGCTGGGGG TGCGCGTACT GGACGGTTTG CTTACATGCG GGGAAGGCCA 
20801 GCGTCTGGGA ATTTTCGCAG CCGCCGGCGG CGGCAAGTCG ACCCTGCTGG 
20851 GCATGCTGGT CAAGGGCGCC GCGGTCGACG TGACGGTGGT GGCGCTGATC 
20901 GGCGAGCGTG GGCGGGAAGT TCGCGAGTTC CTTGAGCACG AACTCGGTCC 
20951 GGAGGGCAGA CGCAAGAGCG TGATCGTCTG CGCGACCAGC GACAAGTCCT 
21001 CGATGGAGCG TGCCAAGGCG GCGTACGTCG CAACCGCCAT CGCCGAATAC 
21051 TTCCGCGATC AAGGGCAGCG TGTACTTTTT CTGATGGACT CGGTCACCCG 
21101 CTTTGCGCGA GCCCAGCGTG AAATCGGCTT GGCGGCAGGC GAGCCGCCGA 
21151 CGCGGCGCGG CTATCCACCG TCGGTGTTCG CCACCTTGCC CAAACTGATG 
21201 GAGCGCGCCG GCATGAACCA GACGGGTTCG ATCACGGCGC TGTATACGGT 
21251 GCTGGTCGAG GGGGACGACA TGAACGAACC GGTGGCCGAC GAGACGCGTT 
21301 CGATACTGGA CGGCCACATC GTGCTCTCGC GCAAGCTGGG AGCGGCGAAT 
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ACCGTCTATA 


TGCCGAGCTG 
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GACGAGGTGC 


TGCAACGAGT 
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GGCGCTGCAG 


CTCGACGACG 


21951 


TGCTCGCGCA 


GCAGCGCGAG 


22001 


CGGATCGCCG 


AGTTGGTGCG 


22051 


CGAGAGCCAG 


GAAGATCGCG 


22101 


GTGGGCGCGA 


CGATGCATCG 


22151 


ACCAGCCAGA 


CGGGCTGGGT 


22201 


GGCGTGGCGC 


GCACGCCGTA 


22251 


TGCCTTCGAG 


CGGGAAATGG 


22301 


GGCCGCAACG 


CCTGGCGCCG 


22351 


ATGGAACCTG 


CCGCCGGCCG 


22401 


AAGCGTGGCT 


GCGGGGCTGG 


22451 


AAGCCCGTAT 


CGTTGTGGAC 


22501 


GTACGGGAGG 


ACGGCGGCTG 


22551 


GGACGCTTGC 


GAGCGCCTGC 


22601 


TCGCGCTGGA 


GCTGGCGCGC 


22651 


GAGCCGCACG 


AGCGGGTGGC 


22701 


GTGGCCGGCG 


GGGCGGCGGC 


22751 


TCCGCGGTTG 


AGCGCCGGCG 


22801 


ATGGCGCGCG 


ATTCGACGTT 



GCTGGCCTCG GCCAGCCGGG TCATGAATGC 
AGTACCTGGC CGGACGTATG CGCGAACTGA 
GAGCTGTTGG TGAAAATCGG CGAGTACAAG 
CGATGAGGCG ATACAGAAGA TCGGACAGAT 
TAACCGACGA ACGCGAAGCA TTCGAGGATA 
ATCATCGGAC CCGAATCCTA ATGGACCTGG 
CATTTTCGCG CCGACCAAGC CCAGCTTGCG 
CTGCGCGGTT GCTGCCGCGG CGCAGCGTCA 
ATTGTCGCCT GTGGGCCGGA CAGCTCGAAA 
TGCCGGCGCA TCGTCAAGAC ACGCGACATC 
GGGCCACGCC CGCGACCGCC AGGCCAGCCT 
CC.GTGCGCCG TCACGAACAT GAAATCCAGC 
CAGCACCGGG AGTGCTTCCA GGCGCAGCAA 
CCTGCAGCAG GTCGAGGCGG CGGCCTTGCG 
AAATTCAGGA AGCCATCGAA TTGTCGGCGC 
CGAGCCGGCG ACGGCCTGGC GCGGCTATGA 
TCGCCCATGG CCGGCGGCGG GCAGCGCATG 
TGCGCGTCAG CCGGATCGGG ATGCGCAGCG 
AACAGGAGAA AGCGAAGGAA GAACTGCCCG 
GGTCCGGCCT GCGTCGGCTG GCTGGCGTCG 
TCCACCGGCC AGTCTGGCCC AGGCGCTGGC 
CGGTAGGCGA CGTGCTGGAG GGGTATCGCG 
GATACGCTGC TACCCGACAC CACCTTGTCG 
GATCGTGGTG GCTTTCGCAT GCCGACAACG 
ACGCGTGCGC CGACCGGTTG GCCATGGAGC 
GACGTCGAGG TTGCGGTGGC ATGCGACGGC 
GCGCGCGCAG CGGCCGTGGC GATGAATCGA 
GCAGGCCGCT GGCATGGTGG ATCTCGCGGT 
AGGCCCATGC CCTGTCGAGG ATTGCATGCC 
CGGCTTGGCG AGCCGGCCGT GCGCTGGCAC 
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22851 


TGCGCCCTGA 


CGCCTTGCGT 


22901 


AAGCCTGCAA 


CTGCAATGGG 
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GCGCGGCCGC 


GGCGGGATGG 


23001 


GTGGAGTTGC 


CGGAACCCAT 
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GGAGGTCTGT 


CGAGGCGTGG 
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TGGCGCGGCA 


AGGCGGGACG 
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CGCCTGACGG 
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CCGATGGTCG 


GGTCAATCCG 
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GTCGGCGCGG 


CGTCCGTGAC 
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CGACGTCGTG 


TTGCTCGCGC 
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GGTTGTCGGC 


CGGACCCAGC 


23451 


TTTCGTGTAA 


CTCAAGGTTG 


23501 


TGACCCTGGC 


GAAACCCCGG 


23551 


AGATACCCGT 


GCGCCTGACG 


23601 


GCGCAGCTGC 


GCAGCCTGCA 


23651 


CATCGCCGAC 


GGGCCGGTCA 


23701 


GCGGCCGGCT 


GGTCGACATC 


23751 


GTCAGGCCTG 


GACTCGCATG 


23801 


TTTCTGGCGC 


TGCTGGCGCT 


23851 


GTTCCTGAAG 


ATCGCCGTCG 


23901 


TGCAACAGGT 


ACCGCCCAAC 


23951 


TCCGCGTATG 


TGATGGCGCC 


24001 


GGCCTTGACC 


GCGCAAGCCG 


24051 


TGGACGCCGT 


GCT-TGGCGTG 


24101 


TTCATGTTGC 


GCAACAGCCA 


24151 


AGCGCGTCAT 


CTCTGGGGCG 


24201 


ACCTGCTGGT 


ATTGACGCCC 


24251 


TTCCAGCTTG 


GCTTTCTGCT 


24301 


CGTATCGAAC 


ATTCTTCTTG 


24351 


CGATCTCCAT 


GCCGTTGAAG 



GCACGGCGAC CTTGCCGATG GCGAGATGGA 
CCGGGACGTA CATCGGCCTG ACGGTTCCGC 
CTGGCGGCGC GCCTGCCCCG GTTTTCCGGC 
TGCGGCGGCG GCCCTGGAGG CAATGCTGGA 
CCGGACTCGA CCAGCAAGGC CCGGTCCGCG 
CCACCGGTCC AGCCGCATCG CTGGACCCTG 
TGGCGTCTGG CGCGCGGTAC TGGCGTGCGA 
TCGCGGCGGC GCTGGATTCC GTTGCGCCTG 
GAGCGCGTGC CGGTCAGGTT GCGTGCCGAT 
CGCAGGCCAG CTGCGGACGC TGCGAGCGGG 
AGTACCGGGT GAGCGATGCC GCAGAACTAT 
GCGATCCGGG TACGGGCCGA GCATGCGTCT 
GACTCCCATC ATGACGGAAC CCGCGACACC 
CACAGGCCGA CGCGACGCTC GATACCGATC 
TTCGACCTGG GCGAGCGCGA GTTCACGCTT 
TCCGGGCTGC ACGTTCGACC TCGAGCGGCC 
TGGTGCGGGC CAATGGCCTG TTGCTGGGCA 
GACGGCCGCA TCGGCGTGGT ATTGCAGTCG 
AGCGATACCG ACCCCTTCAG CCTGGCCCTG 
GGTACCGCTC ATCGTCGTCA TGACCACGTC 
TGCTTGCCTT GGTGCGCAAC GCCCTGGGAG 
ATGGCCCTGT ACGGGCTGGC GCTTATTCTT 
GGTCGTTCAC AGGATAGGCA CCGAGGTCCA 
GGGAGTCCGG CACCGCCGCG CCGATGGCGC 
GCCGAGCGAG GCGTGGGGCC GCTGCGGGCC 
GCCGGCCCAG CGTGATTTCT TCCTGCGCAC 
AGGAGGCATC GCGGGACCTG TCGGAAGACA 
GCATTTCTGG TTTCGGAGCT GACCGCCGCA 
GTACCTGCCG TTCATCATCA TCGACCTCAT 
CCATGGGAAT GATGATGGTT TCTCCCGTGA 
CTGTTCCTGT TCGTCATGGT GGACGGCTGG 
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24401 


ACGCGCCTGA 


TCCAGGGCCT 


24451 


ACCCAAGACC 


TGGTTTCGTT 


24501 


GCTGTCGCTG 


CCGCCCATCG 


24551 


CCCTGTTGCA 


GGCCTTGACG 
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TAGCCGTGTT 


24651 


AAGCGCGGAA 


ATCTATAACT 


24701 


GGATCCACTG 


AGCGGCCAAT 


24751 


CGAAGGTTTT 


CCTGGGAACG 


24801 


GCCATGCTCT 


TTCTGCCGAT 


24851 


GCGTTACGCC 


GTCGGCGCCT 


24901 


CGCCGCAGTA 


TGCCGCGCTG 


24951 


CTGGCCAAGG 


AGGCGATGGT 


25001 


GCCATTCTGG 


ATCTTCGAGG 


25051 


GCGCCAGCCT 


GGGCGCTATC 


25101 


CCCATGGGCA 


TTCTCTTCAA 


25151 


GGGCGGATTC 


GGGTTGTTCG 


25201 


GGAACATCTG 


GGCGTGGTGG 


25251 


ATGCTGGACC 


AGTTCAGTGG 


25301 


GCCGGCCATC 


GTGGCCATGT 


25351 


GCCGCTTCGC 


GCCTCAACTG 


25401 


AGCGCGCTGG 


TGCTGTTCGT 


25451 


GTATGCAGGC 


GAAATCCTGG 


25501 


ATTCAGCGTG 


GCCCGGCCCA 


25551 


CGAAGCGCCT 


GCGCGATTCC 


25601 


GACTTTACCC 


AGACGGCGCT 


25651 


CAATGCCCCG 


TCCATTCTCG 


25701 


CGGCCTTTGC 


CGACCAGGGG 


25751 


GAAATCCTCG 


ATCAGGCCGT 


25801 


GCTTGGGGTG 


GGGATGTTCG 


25851 


CGTTTCGAAA 


GCTCAAGCCT 



GGTGCTTTCC TATCGGTGAC CAGCATGCAA 
CATGACACAG GCGTTGTACC TGGTGCTCTG 
CCGTGGTGGC GATCGTGGGA ACGCTGTTTT 
CAGGTGCAGG AGCAGACCCT GTCCTTCGCC 
CGCCACGCTG ATGCTGGCGG CCCGGTGGAT 
TCACGATTGC GGTGTTCGAT GCCTTTCATC 
CGATGCACAC GGAGTTCAAT TTCGTCGAGG 
CTGGCCATGA CGCAACCGCG GATACTCACG 
GTTCAACCGT CAGTTTCTGC CTGGTCCGCT 
GTCTCGGGCT GATCGTGGTT CCCCAGCTGG 
GATATCGACT GGCCCCGGCT GCTGGCGCTG 
GGGCATGTTC CTGGGTTGGC TGGCTGCCTT 
CCATCGGCTT CGTCATAGAC AACCAACGGG 
CTCAACCCCG CCACGGGCAA CGATTCGTCG 
TCTGGGATTC ATGGTGTTCT TCCTGACGGC 
CCACGATGCT GTATGACAGC TTCGGGTTGT 
CCGTCCATGC CCGCACAGGG CGCCGTGCGG 
CTTTGCCGCG CGTGTCCTGC TGCTGGCCTC 
TCCTGGCCGA GCTGGGCCTG GCCCTGATCA 
CAGGTGTTCT TCCTGGCTCT GCCGGTAAAG 
GCTGGTGCTG TACATGGCAA CGTTGTTCCA 
GTTCTGTGGG CCGGATCGTG CCGTTCCTGC 
TGAGCGGCGA GAAAACCGAG CGGCCCACCC 
CGCGAGAAAG GCGAGGTCGC ACACAGCCGG 
GATATGCGCC TTGTTCGGGC ACTTTCTGAT 
CGTCGCTGCG AGCGCTGATA CTGGCGCCGG 
TTCGCCGTCG CATTGGGGCC CGTGCTGACG 
CCGCGTGCTC GCTCCGCTGA TTCTCATCGT 
CCGAATTCCT GCAGGTAGGC GTCGTGCTGG 
TCGGCGGAGA AACTGAATCC CGCCGGCAAT 



Fig. 5 17/23 



THIS PAGE BLANK (uspto) 



m 



25901 


TTGAAGAATA 


TCTTCTCGGC 


25951 


ATGCAAGATC 


CTGTTTCTGG 


26001 


CCTTGCAGCC 


GCTGATGGCC 


26051 


ACGGGCGTAG 


GCCGCATTCT 


26101 


GTACGGGGCG 


ATTTCGCTGG 


26151 


GCAAAGGCTT 


GCGGATGAGC 


26201 


ATGGAAGGCG 


ATCCCCATAT 


26251 


GCTGATCATG 


CATGGCGCGG 


26301 


TGACCAATCC 


GACACACCTG 


26351 


ACGCCCTTGC 


CGCGCGTGCT 


26401 


CATGGTCGAG 


GCCGCGCGCG 


26451 


CGCTGGCCCG 


CGCCTTGCAC 


2 6501 


GGCGAGTTGG 


TGGAGCCGGT 


26551 


ACTCAAGGAG 


C AG AC AT G AC 


26601 


TGCGCGACGC 


CATGGCCTCG 


26651 


CCGTCCGGAT 


CGACGGACGG 


26701 


GGCATGCTGG 


TGTTGCGGGC 


26751 


GGCGCGCGCG 


GCGCAGTTGC 


26801 


CCCCTGCGAC 


CGATGCGTAC 


26851 


TATCTGCACC 


AGCGCGTCGC 


26901 


GGCGGTGGGC 


GAGTTCGTCA 


26951 


CGCAATGGCA 


ATAGGTCGGC 


27001 


GGGGTGTCAT 


GCTGTTGGCG 


27051 


CCTTTGGCGC 


CGTATAGCTA 


27101 


GCTGCGCGAG 


TTCGCCGCAG 


27151 


GGGTGCAAGG 


CGTGGTCAAT 


27201 


TTCATCGAGC 




27251 


CGGCACGCTG 


TATGTCAGCC 


27301 


ATGCAGCCGG 


CGCTTCGCCG 


27351 


GGCATCCTGG 


ACGAACGCTT 


27401 


GGCCATGGTG 


TCAGGGCCGC 



GCGCAACCTG ATGGAGTTCA TCAAGTCGGT 

cbpTGTTGGT CACGTTGGTG ATACGGGATT 

GTTCCCCATA GCGGGCTGGA CGGGTTGCGA 

GCAGGTCATG GTCTGGAACA TCGGACTGGC 

CGGACCTGGC CTGGCAGCGT TACCAGTATC 

AAGGACGAAG TGAAGCAGGA GTACAAGGAG 

CAAGCAGCAA CGCAAGCACC TGCACCAGGA 

CGGCCCAGGT TCGCCGGGCG ACGGTGCTGG 

GCCGTGGCCC TGTACTACGC GGCGGGCGAG 

GGCCATGGGG CAGGGAGCCG TGGCCGCTCT 

ATGCCGGCGT GCCGGTCATG CAGAACGTCG 

GACCAGGCGG AGGTGGACCA ATACATTCCC 

GGCCGCGGTG TTGCGGGCGG TGCGCCAGGC 

AGCAACCATT CATCCCGATA TTGCCGATTA 

AACCCTCGGT CGACGCCGAT GGCGGGCTTG 

CATCGCGTCA GGTTGATCCC CGCCGAAGAC 

GCGGCTGGCC GAGCTGCCCG ATGGGTGGCA 

GCCGGGCGGG CCTGCTGGCC AGCGCCATGG 

TGCGGCATAG ACCAGGGCGA AACCGCGTTG 

ACCGGCCGGC AGTGCGCTGG CGGTGGACGA 

ATGCCTTGGC CACTTGGAAA AGGGCGATGG 

TTGGGTATCT TGTCCGCGGC GCATGGGCCG 

GCCGGTAGCG CCTGGGCGGC GCCGAACTGG 

CTACGCGCAG CAGCAGAGCC TGTCCGATGT 

GCTTCAGCCT GGCGTTGCAA CAGGGCAAAG 

GGGCGTTTCA ATGCGCGCAC ACCCACGGAG 

CATCTATGGG TTCAACTGGT TCGTGCATGC 

GCACCAGCGA CGTGGTTACC CGCGCGGTGG 

TCGGCGTTGC GCCAGGCCTT GCTGCAACTG 

CGGATGGGGA GAGCTGCCGG CGCAAGGCGT 

CGGCCTATGT CGCGCTGGTC GAGCAGGCGG 
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27451 


TAGCGGCGTT 


GCCCAAGGGG 


27501 


CTCAAGCATG 


CTTCCGTGAG 


27551 


GGTAGTTACG 


CCGGGGATGG 


27601 


CGGGGCCGGG 


CAACGACGCG 


27651 


GAAAATCCGC 


CGGTGTTCGG 


27701 


CGCTGGCGCA 


GCCCAGGCAG 


27751 


AGGCCGACAC 


GCGCCTCAAT 


27801 


ATGCCAATCT 


ACCGTGCCCT 


27851 


GATCGAAATA 


GAGGCCATGA 


27901 


AGCTGGGTGT 


CACCTGGGGG 


27951 


GGCGATCTGG 


GGCTGCGTCC 


28001 


GGCCGACCTG 


GCGCCCGGAA 


28051 


CGGCGCGCTT 


GCGTGCGTTG 


28101 


CAGCCGTCCA 


TCCTGACCGC 


28151 


GGATACCTTC 


TACATTCGCA 


28201 


CTGTCACGGT 


GGGTACGTCG 


28251 


AAGGGAGGAC 


GCCAGGTGGA 


28301 


CTTGCAGGAG 


TATCCCATCG 


28351 


TCAGCACGCT 


GGCGGTGGTG 


28401 


TACAACAATC 


GCCGTGACGA 


28451 


AGATATCCCC 


GGCCTGGGGT 


28501 


AGCGCCGCGA 


GCGGCTGTTC 


28551 


GGCAAGCCGG 


TCTTCAGCCC 


28601 


CACGGGTTGG 


GGCGGGCATG 


28651 


GCGGGCATAC 


ACAAGTGCGT 


28701 


CTGGTGCCGG 


ATTCATTGCA 


28751 


GCCCTGAGCG 


TGGCCCGGGC 


28801 


GTTCGCGCAG 


CGCTTCGCGC 


28851 


CGAATGGTTC 


CTACAGGAAC 


28901 


GAGTTCTTCC 


ACGCCGACCA 



GCCGGCAATC AGCAGGTGGC GGTGTTTCGC 
CGACCGGGTG ATCCGTTATC GAGACCAGCA 
CCACCATGCT GCGCCAATTG ATCCTGGGGG 
GCGCTGGCCG CGGTGGCGGC GCCGCTGCGG 
CGATGCGGCA GCTGACGGGA ACGCGCCGCT 
CCGGCCGGCG CCTGAGCGAG CCCAGCGTGC 
GCCTTGATCG TGCAGGATAT TCCCGAACGG 
GATCGAGCAG TTGGATGTGC CCAGCACCCT 
TCGTGGACGT CAATACCGAT CTGGTCAACG 
GCGCAGATCG GAACCACCAG CCTGGGCTAT 
CGGCAACGGC CTGCCCGTGG ACGGCGCGGC 
CCTTGGGGAT CAGTGTCAGT ACCCGGCTGG 
GAGTCGGACG GGCAGGCCAA TATCCTGTCT 
CGACAACCTC GGCGCCATGA TAGACCTGTC 
CCCTGGGCGA GCGCGTAGCG ACAGTCACGC 
TTGCGTGTGA CGCCGCGCTA TATCGCCGCC 
ATTGGCGATC GATATCGAGG ACGGACGGGT 
ATGGTCTGCC CCGGGTTCGG AAAAGCAGCA 
GGGGACGAGC AGACGCTGCT GATCGGCGGC 
AGAGCAGGTC GAGAAAGTGC CGCTGCTGGG 
TCTTGTTCTC GAGCAAGTCC CGGGCGGTAC 
CTGATCCGGC CGCGTGTCGT GGCTATCGAG 
CGTTGCGGGC ACGTCGCAGG TGTTCATGAG 
GCAGCAGCCT GAGCATTGCA CCCGGCGAGG 
CATGATGCCC GGGCGGGCAG GCCGGTCCGG 
TGTGGAGTAT GGCGAGGCGG GGGAGGCGTC 
AGGGGGGCTA CGGCACGCTG TCGTAGCCTC 
AAGGCACAGC GGGCGCGGGA AAGTCGGCTG 
CGAAAGCAGT GCGGCAGCCT CTTCATAGGA 
T GAG GAT C AC GTCGCGCATG CTTTCGGGGA 



Fig. 5 19/23 



{ 



THIS PAGE BLANK (uspto) 



28951 


GCTGCTCCAG 


CGCTTCGCGT 


29001 


ACGGCTTCGG 


GGTTGGGCGC 


29051 


GgAGGTGAAT 


TCATAACGGC 


29101 


CCAGATTGAG 


CGCGATGCCG 


29151 


CGGAACGATT 


GATACGCGCG 


29201 


GTCCTCGACG 


TCGCTGCTGT 


29251 


ATCGGCCGGC 


ATGTTCGGCT 


29301 


CGCTCGGTGG 


CGATATCGGG 


29351 


GGGAAGACCG 


GATGACGGGG 


29401 


AGGGTGATAA 


TTCACGTCAC 


29451 


GTTGTTAACT 


TCCGAAACTA 


29501 


CAGATACAAC 


GCGCTGAACG 


29551 


GGGACTACAC 


CAGGTGGTGC 


29601 


GTATGGGCGG 


ATACCGGCGC 


29651 


CTGCTGCATG 


CtTTCGTCGA 


29701 


CGGgAGAGAC 


ACCGGTGTCG 


29751 


GCGCGGGCCG 


AC C AG G AC AG 


29801 


GTGGCGTgAG 


GTTTCCGCCA 


29851 


ACAGCGATGT 


CGTCGGGACC 


29901 


CGGGCGATTC 


CCGTGCCGTA 


29951 


AAGCGTGTCC 


TCGTCGGTGG 


30001 


TCGTGCCGGA 


TGCGATGACC 


30051 


AGGGTCTCTC 


CTTGGCTGGC 


30101 


AGGTAAAGCG 


GGTGGGATCT 


30151 


ATCCTTGGGT 


GTATCGCGTT 


30201 


TGTGACGCTT 


TCGGCTCTTC 


30251 


CATGCAGGCC 


TCGCTCCCGT 


30301 


GCAAAGTGTC 


GGCCATACGC 


30351 


GTCGTGCGAC 


CGGGGCCAGC 


30401 


CGCAACTTGC 


GCCGTCAACG 


30451 


CGCGCGCCTG 


TGCGGGGGGT 



AGCAAGCGCA TGCGTTGACG CTGCTCGGTC 
ACTGCATGGC ATGACACCCA GGCTGGCGTC 
GCTCTGGCGC ACGCGACAAG TGATTGCGGA 
TACAGCCAGG TGGAAAGCTG GGAGTCGCCA 
CGCCGCCTCG GCGAAAGCCT GCTGCGCAAG 
GGCCGATGTG CTTGGCGATG AATCTCTGTA 
ACCAGGCTGC TCAGCAACTG CTGGTCCGAT 
ATATGCGTTG TCCGGTTTTT CGAGAGAAGC 
GAGAAACCAT GCAAAGCGAT ACCAAGTGAA 
CAAGATACTG ACTGCCGGTT TTATCCGGCA 
ATGTCGGATC GCGGTCGCTA CCGGAGCATT 
TCTTCCGTAA AACTTACGAC GCGACGTATG 
GCAAGGACCT CGCGCaaCCA TTCTTGCGCC 
CTTACCTGAT TGTGCGGCGG ATGCGAAGCG 
TCGTGGCGCG CAACGTTCCG GATATCTGAT 
TGACAGACCT GTCGGAAGCC GaCGGCGTCG 
GAAGGTAAGG AAATCCACGC CCTGCAGGGC 
TGTCGACCGC TCGCGTGACG ACGCGCGCCG 
GCGTGCAGAT CGAGCTCCTT GGCGACGGCG 
GGCCAGCGCC AGCGCGCTGG AAAAC AT G AC 
CGACCCAGGA TACGTTGCGC CCTGACGGCG 
TGGAACTGCT CGCCTGCTTT GGTGACATAT 
GGCGCGCGCA AAGACATCAA GCTCCAAAGC 
GGAAGTTCAT GCGGTGGGCC GTTGTCTCGA 
CTACGGAGCG GGAACATGGA ATATGCACTG 
GGTTCCATGC CGGCATGACA AACCCGACGG 
CCGGCCACCG AGGCTCGCCT TGAACGCTGC 
TGGGCCTCGA GCTGGAAATG GTGGTGGCcT 
CATCCGGTGG CGCGCCATTT CGAAGCGCTG 
CGGAGAGTCC GTGCAGGAGT ACCGTCTGGA 
GGGCGGTCCC CATGGCCTGA GCGGGCTTGA 
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30501 


TAACGGCTAT 


AACCTGCTCG 


30551 


CGGGAGGGCT 


GCGCCGACTG 


30601 


ACGCAGCTCG 


CCCTGGCCGC 


30651 


GCACCCCGCC 


GCCAGCCTGG 


30701 


CGCGGCCCAT 


CTACGAGGAA 


30751 


ATCGGGATAG 


ACGCCAAGGC 


30801 


GGCCATCGCC 


GCGCGCTGCC 


30851 


AGATCGCCAT 


GTTCGCCAAC 


30901 


CTCAAGGAAA 


ACCGCCTGAC 


30951 


CTACCTGGGC 


GACGACCTGC 


31001 


ATCTCGGCGA 


TTATTTCCGC 


31051 


GCGCTACCGC 


CGGGCGACGC 


31101 


CCTGGTGGGA 


GCCCCTTCGC 


31151 


CCGCGCGAAA 


CCTGAATGAT 


31201 


GAACATTTCG 


TCTATTCGCA 


31251 


CTACAGGATG 


CCGATTGTCC 


31301 


ACAGGCAGGG 


CGGCCTGGAA 


31351 


TACATCGAGG 


GGCGCGCGCC 


31401 


GAGCTCAGCC 


GGCGATGCAG 


31451 


CGCTGCAATT 


GGGGCTGTTG 


31501 


AGGCGATGGG 


GCTGGCTGCG 


31551 


TTTGGCGTTG 


GACGATGCGC 


31601 


CGGTAGCCGA 


AGGCGGGCTG 


31651 


GTGCGTTACG 


TGGTGGAAAC 


31701 


CTTGTGGCGC 


CAGGCGCGCG 


31751 


GCCGGCAGCG 


CGCGGTGCTG 


31801 


CTCGTTGGCC 


AGGCCTGTCA 


31851 


CAGCAGAATA 


TCCACCGGTG 


31901 


GATTGATGGC 


TATCTTGGCG 


31951 


GGCTTTTCCC 


CATTGAAAAT 



AAACCGCATT CGCGCCTGTG AACGGCGGCG 
GCCGAAGCGG TGCGCCGCGA GCTGCGTGAC 
GGAGGGGGCG ATGCTGATCA ACGCGGCCGA 
ATGCCGACTG GTACCGGCGA GTGCGGGTGC 
CTCGTCGGCC AGCGAGGCTG GCTGCACCGG 
AC AG AAC AG C CCCTGCACGT CGGTTCCCGT 
TGAACGTCGT GCTGGCGCTG GCTCCCGCGC 
AGCCCGCTGG AGGCAGGGCG GGTGACCGGT 
CCTGTGGCCG CGCATGTTCC GAGGCGCGCG 
TGCATCGCCT GCCTGCAAGG CCGTTTCGCG 
TGGATGTTCG GCGGATTGAC CGCCAGCCGG 
TTGCGACTAC AAGAACGCCG ATGTGGCCTG 
TGGCAGAGTT CCTGTATGCG GGCGCGTGGT 
GGCGGTTCCG TGCGTCTGGC CGCGCGCAGC 
GTTCGCGCAG TTCCTGGACG CGCGTTGGCG 
CCGCCTTGCC GGCGCTGTTG CGAGCCTGGG 
GCGCTGTTCG AGCAGGCCGG CGCGCAAGGC 
GGGCGCGGTA TTTGCCGATG CCGACTTGCT 
TCGCGGCCAG TGCGCCGATG GCGGCGTCGG 
CGCAATCTGC ACGACGCCGA GGCCCTGGTG 
CTTGCGTGCG TTGCGCGATC GGGCCATCGC 
AGGTGCGCTG CCTTTGCCAA CAGGTCGTGG 
GCCGGCGACG AGCAGCAATG GCTCGATTAT 
CGGCGAGACC GCCGCGGACC GCATGCTGCG 
GCACGCCTGA GATGCGCCGC GCACAGGCGT 
TCCTAGGATC CGGCACATTC CTTGCCAGGT 
GTGCTCCGCT TCGTACACCT CGTCGGCCGC 
GGTCGAAGCC GATGCGCCGC GCCGTTTCCA 
GGGGCATTCC AGACCTGGCT GATGCTGCGC 
GCGGGCAATG GTCTGGGCGT GGAACATGCC 
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32001 TACGCTGGAG TAGTCCGCCT TGGCCAGGCT CATCAACAGG CCGGCCTTGA 

32051 CCTCGTCGGA GCCTTGCATC GAGAAACTCG GCACGCGGGC GGCGCGCAGC 

32101 AGCGCGGCGA GCTGCTTGAC GGACGTCGAG GTGATGCCCC GGTGCTCGGT 

32151 GACGTaAAAG GCGTCGACTT CGcTCGACAG CTTCTGGTAG CAAGCCAGAA 

32201 CGTTCTGGGT TGCCGTGGCG ATGGGGATGC CGGTCGCGCG TGCGTCGcAA 

32251 CGCTTGACGG AGAAATCCAA TGCCGGCATT AGTGCGGCGA CCTTATCGAT 

32301 GGCTGCGTAG GTGCGACCTG CTTCGGTGTC TTCGTAGACC AGTCCAAGCG 

32351 TCTTGAaCGG CACGATGTCA TGGAGCAGCT GGATCTGCCG CTGGTAGTGG 

324 01 TCGGgCTGTA CCCGGGCATG CAGGTTGTCC TGGCCGCTGT CGGCCGCACT 

324 51 GGGTATGATC CGGGCGCTTA TcGGGTCGgT CGACGAGACG ACCACGGTGG 

32501 GTACCGGCGT GCCCAGTTCG ACCATGTCCT GTCCAGCCCA GGTACCCATG 

32551 GCGATGATCA GGTCGATGTC CTTGGCGCCA TGCAGGCGTg CCGCAACGGC 

32 601 TTCGCGCACG GCAGGCCGCA AGGcGGTGTC GAAGTTGCCG GGCTgCCACC 

32 651 ACGCATCGGG CaCGAACTCG ATGTAGTTGC TGCGGGCATG CGTGGCCAGG 

32701 TAAAGCCAGG CCTTTCGCAT ATCGGTTATC TCGGGCATGT CGTCGATACG 

32751 CAGCCATCCG AGTTGTTGCA ATGCGCGCGC GATCGCGTAG AGCGTGCGCG 

32801 GATACTCCTC GTACTCGCCG CTACCCACAT AACCGATGCG CCATTTCCGG 

32851 CCGGATGTAT GGGAGGGAGG CGGAAGGCGG GGCGAGGTAA GGGCGACGCC 

32 901 GGAGCTCAGG GCCGCGACAG GAGGAGGGCT GGATGCCGCC GCGATGGGCC 

32 951 ACGCGAGGCC AAGAAGCAGG GCGAGCGGGG CGAGTATTCC GGGGCGAAGG 

33001 GTCATGGGCG ATGAATGGCG ATGATGGTGA GATCGTCGGA TTGTTCGGAA 

33051 TCGGCGGCGA ATTCGCGTAG GTCGTGCAGA ATGTGCTCGA TGAGTTCGGC 

33101 CGCTGCGTGC GGCGCGCCCT GCATCAGGGC GACCAGCCGC GGCAGACCAT 

33151 ACTGGGCGCA GCCGCCGTGG ATGGCTTCGG TGACGCCGTC GGTAAACGCG 

33201 ACCAGCGAGG TGCCGTTCGG CAAGGTGGTG CTCAGGGTGG AATACGCCTC 
33251 GTTGTCCAGC ACGCCGCAGG CCGCGCCGCT GCTTCCTTGA AGCAGGCGGA 
33301 CCTCGCCACG TTCGTCGATG AGCAGCGGCG GCGGGTGGCC GGCGTTGACC 
33351 CAgGCCAGGG CGCCTGTTTC CGGGGTGAAG ACGCCTATCA GCAAGGTGAC 
33401 AAAC AT C AG C TTGGGGTTGT TCTCGGCCAG ACGGTGGTTC ACCTTGGTGG 
334 51 CGATGGCGCC CGGGTCGTGC TCTTCTTCCG CCACGCTGCG TATCAAGGTC 
33501 CTGACGATGG CCATGAACAG GGCCGCGGGC ACGCCTTTTC CGGATACGTC 
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33551 


GCCGATGGCA 


AAGCACAGAC 


33601 


AATCCCCACC 


GACCTCCCGG 


33651 


CCGCGCGTGG 


CCGCATCGGG 


33701 


GGAGCGGGCG 


ATGCTCAATT 


33751 


GCGCCATCAG 


GGCCCGCACA 


33801 


AACGATTCGG 


CGAGCTGTCC 


33851 


TGCCACCGAC 


GGCGGAACCC 


33901 


GCTGGCGAGC 


GTAGTTGCTC 


33951 


GCCACCACCC 


ATGCCAGCAT 


34001 


CAGTGCCTGC 


CGGCGCACCA 


34051 


CGGGAACGAC 


ACCGATGATG 


34101 


TCGATCTGCC 


AGGCGCTTTC 


34151 


GG TAG AC GAG 


ATTTCGGCAA 


34201 


CGTCTGTCGA 


GTCCAGCAGC 


34251 


ACCGTGCCAT 


CGTCCGCAAC 


34301 


CAGCTCCGAC 


AGGTTCCGGT 


34351 


CGGCAACCTT 


GTCGATGATG 


34401 


CACTTCCACG 


CGGGGAAGTA 


34451 


CTCGTCCAGG 


GGAGACGGGT 


34501 


TTTCCTCGTA 


CATGGCGGCG 


34551 


CCggagAGGt 


CCCGGTCGAn 


34601 


nCCTTcCGCG 


TCATAGGCGA 


34651 


GGTTCAGCCA 


GACACGCGCC 


34701 


CCGCGCTCGG 


CCTGCGCGGC 


34751 


GCTGAGTTGG 


ATCAGTTGCC 


34801 


CGTCGAGCAG 


CGTGGACCAG 


34851 


ATGATGTTAC 


TGACGGCATG 


34901 


ATCGCGCTGG 


GTCACGAGCA 


34951 


AT G C AAT GAG 


CAGGAGGAAC 


35001 


CGGGGAGTAC 


GGCGCATGAA 



GCCCGTCTGC CAGCACGAAG TAGTCGTAGA 
GCCGGGTACA TGACGGCACG CAACTGGCTG 
CAACGGCTGG GGAAGCAGGC CAAGTTGGAT 
CGCTTTCGAG GCGTTCGCGG TTCGATATCT 
TTGTGGTGCA GCTGTTCGTT CATGAACAGG 
GACTTCGTCG CGCCGCCGGC GCGGCAGGCA 
GGATCGGCTC GGTGAGGTCC TGGGTGGGAA 
AGTTGCGCCA ACGGCCGGGC GATGCGCACC 
CAGCCCGGCC AGCAAGGTGG CGGCGAAGAT 
GATTCTGTGC CGGGTCGGTC AGGTCCGGCT 
GTCCAATGCA GCGGCTTGTA TCGCAGGGCG 
GCCGTTGGTA AAGCGCAACG TCAGGCCGCG 
GCATCGAATG CAATACCCGT CCCGATTCGA 
CGGGCGGCCG ATGGGGGTGG CGGCACGATC 
CACGAACACG AAACCATGGC GGCTGAGCCG 
CTATCGCGGC AATCATGTTG GCTTTCTGGG 
GCTTGTGAGC TATCGGAGAT GGCGAGAACC 
CACGAAATAG GCGTGTCGCA TCTGGGCGGA 
AGATGGCGAA GCCGCGACCG TCGTTGCGGC 
GCGAGCGGCC GGCCCTTGAA GTCGCGGATC 
nnnnCGGGGG TTGGTGCTGG CCAGCACnnn 
AGGCGACGCG GCGCGGTCCC AGGTCGAGAT 
ATGCCCTTGG CGGCGCCGGT AGTGACGTGT 
ATAGGCGTTC AGCACCGATG TCACGACCGC 
TGCGGCTTTC CCGGATGGTG CGGATCTTGT 
CGCGTGTCGG TGTCACGAAC TACCAAGTCC 
CAGTTCGTTC TTGATGATGT TGTTCGTGAC 
TCACGACGAT GCCTACGAGT AGCAGCGTGG 
TTCCCACGCA ATGAAAGCGG CAACTCTAGC 
CATGAA 
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